go-spatial / tegola

Tegola is a Mapbox Vector Tile server written in Go
http://tegola.io/
MIT License
1.25k stars 193 forks source link

Feature: adds configurable query parameters to tile endpoints #867

Closed bemyak closed 1 year ago

bemyak commented 2 years ago

This PR is a successor of https://github.com/go-spatial/tegola/pull/795.

The main differences/features are:

A user can define custom parameters on per-map basis like this:

[[maps.params]]
name = "param2"
token = "!PARAM2!"
# only very simply types are suppored: `int`, `string`, `float`, `bool`
type = "int"
# `?` will be replaced with the actual value
# if `sql` is not specified, it is considered to be `?`
sql = "AND ANSWER = ?"
# if the parameter value is missing from the HTTP query, this value will be used
# `default_value` is an optional parameter. If it is not defined, the parameter is considered as required and cliet will get 400 error
default_value = "42"
# instead of `default_value`, `default_sql` can be defined, e.g. to omit a param query entirely:
#default_sql = " "
# defining both `default_value` and `default_sql` is forbidden

When a request is received, a value of QueryParameterValue is constructed to hold the resulting parameter query (still with ?) and a value (parsed to the proper type).

Then (params map[string]QueryParameterValue) ReplaceParams(sql string, args *[]interface{}) string function can be called which replaces parameters with positional arguments (e.g., $1, $2) and fills an array with their values. This should prevent SQL injections.

bemyak commented 1 year ago

@gdey, thank you so much for the review!

I've tried to preserve the commits from the original feature author, as my impression was that they were already reviewed and approved. However, it might be easier if I redo the commit history and separate code cleanups from the actual feature code. What do you think?

Regarding caching, will it be enough to log a warning? Or should we fail the config validation? We don't support per-map caching, so the best thing we can do without implementing it is to skip maps with configured parameters.

Does this force anyone writing SQL to use $%d instead of '?'?

No, the numbered arguments are handled by the Tegola internals and are not exposed to the end user in any way. In maps' configuration, one uses only "tokens", like !PARAM1!. In the parameter configuration, only ?. The substitution will happen transparently.

What happens if one types a number that is generated in the SQL by mistake?

If that's just a mistake, Tegola will probably catch it during startup while executing inspectLayerGeomType since the query isn't valid. If the geometry_type was specified, that function is optional so that the parameter will be passed to the query and replaced with some value. I don't think we should protect against it. It's the same attack vector as if an admin would put DROP DATABASE in the query :shrug:

We definitely must ensure that the documentation clearly states that only ? is accepted as a placeholder, and numbered parameters here produce undefined behaviour.

gdey commented 1 year ago

I've tried to preserve the commits from the original feature author, as my impression was that they were already reviewed and approved. However, it might be easier if I redo the commit history and separate code cleanups from the actual feature code. What do you think?

I would say go ahead and rebase the code to separate out the clean-up from the actual feature code. This provides two benefits as I see it.

  1. I can cherry-pick the clean-up and get them into dev, as we discuss this. (Through I think it's really close, the only thing I have a concern with is how caching is going to interact with this feature, from a user perspective.)
  2. It's easier to see how everything interacts without the clean-up code distracting.

Regarding caching, will it be enough to log a warning? Or should we fail the config validation? We don't support per-map caching, so the best thing we can do without implementing it is to skip maps with configured parameters.

I feel we should fail on config validation. At least warn there that caching is being turned off due to maps with parameters.

Just skipping a map with parameters seems like the worst of both worlds, because the user has to remember which maps have parameters and which don't. The downside is that there isn't caching for any of the maps.

The PR is logging (with your review changes) when a map is not going to be cached at the time of tile buildup; this is going to inflate the logs (I believe) since this will have every time, an (uncached) tile is requested.

Failing at config validation time, or logging at that time the maps that will not be included in the cache may be better.

@dechristopher @mojodna @ARolek @ear7h opinions?

Does this force anyone writing SQL to use $%d instead of '?'?

No, the numbered arguments are handled by the Tegola internals and are not exposed to the end user in any way. In maps' configuration, one uses only "tokens", like !PARAM1!. In the parameter configuration, only ?. The substitution will happen transparently.

Okay, cool. Could not remember how we had implemented that.

What happens if one types a number that is generated in the SQL by mistake?

If that's just a mistake, Tegola will probably catch it during startup while executing inspectLayerGeomType since the query isn't valid. If the geometry_type was specified, that function is optional so that the parameter will be passed to the query and replaced with some value. I don't think we should protect against it. It's the same attack vector as if an admin would put DROP DATABASE in the query shrug

I need to think about this.

We definitely must ensure that the documentation clearly states that only ? is accepted as a placeholder, and numbered parameters here produce undefined behaviour.

:+1:

coveralls commented 1 year ago

Pull Request Test Coverage Report for Build fe493a746-PR-867


Changes Missing Coverage Covered Lines Changed/Added Lines %
cmd/tegola/cmd/root.go 0 1 0.0%
atlas/map.go 6 12 50.0%
atlas/atlas.go 0 7 0.0%
provider/postgis/util.go 18 25 72.0%
cmd/tegola/cmd/cache/cache.go 0 10 0.0%
config/config.go 76 86 88.37%
provider/paramater_decoders.go 0 12 0.0%
provider/map_layer.go 0 15 0.0%
provider/postgis/postgis.go 28 46 60.87%
server/handle_map_layer_zxy.go 12 34 35.29%
<!-- Total: 218 396 55.05% -->
Files with Coverage Reduction New Missed Lines %
config/config.go 1 77.61%
provider/postgis/postgis.go 1 62.07%
<!-- Total: 2 -->
Totals Coverage Status
Change from base Build dd8c61dfc: 0.007%
Covered Lines: 5745
Relevant Lines: 12676

💛 - Coveralls
bemyak commented 1 year ago

Just summarizing my thoughts and findings on the caching

The problems

  1. Caching is configured on a per-Atlas, not per-map level. Moving it to a map level will introduce extra complexity to the end users while not providing too much benefit. However, we can add a simple cache=true/false param and ensure that all maps with custom params have it off.
  2. I think having cache for some maps is better than having it off for everything.
  3. It is possible to create cache for custom parameters, but this comes with a new set of challenges:
    • Cache key can be extended with a key=value part for every custom parameter used. However, this value must be hashed to avoid attacks like specifying a value as ../../123 and reading tiles from another cache this way
    • The order of parameters, their names, and even queries now matter. If the config is changed, all cache must be invalidated. So we either need to always do it on startup or store a config's checksum somewhere. All in all, I think we should leave this out of this PR's scope.
  4. In this PR, cache is still used for every map, disregarding custom parameters. Changes in the cmd/tegola/cmd/cache/cache.go only affect the tegola cache subcommand. I'll fix it when we decide what approach should be taken.

Possible solutions

  1. Disable all caching if any custom parameter is configured
  2. Skip caching for maps that have custom parameters, printing a warning on startup
  3. Add a cache=false key to the map configuration and require it to be set for every map with custom params

Personally, I don't like №3 as the users don't have any saying in this; by adding this line, they just acknowledge the fact, so it's not a "configuration option".

bemyak commented 1 year ago

@gdey, I rewrote the commit history. It should be much easier to follow it commit by commit now, as there are less back and forth changes.

gdey commented 1 year ago

Thank you for the great summation!

Possible solutions

1. Disable all caching if any custom parameter is configured

Sounds like this would be true Per Atlas and not for the entirety of Tegola instance.

2. Skip caching for maps that have custom parameters, printing a warning on startup

This seems to be the natural solution but will require documentation around people missing the warning on startup, and further clarification in the feature documentation.

3. Add a `cache=false` key to the map configuration and require it to be set for every map with custom params

Personally, I don't like №3 as the users don't have any saying in this; by adding this line, they just acknowledge the fact, so it's not a "configuration option". I agree, 3 is not a good solution, and add caching on a per map bases, should be out of scope for this PR.

gdey commented 1 year ago

@ARolek are we able to upgrade to go1.18, or do we still need to wait on that?

gdey commented 1 year ago

@ARolek are we able to upgrade to go1.18, or do we still need to wait on that?

Talk to @ARolek and we are fine with upgrading to go1.18, and can even move to go1.19.

sjb-citian commented 1 year ago

Got an internal error when trying to build a lambda version of this branch. Has anyone else come across this issue?

Oh yea to clarify, this is after building tegola_lambda and uploading it to my lambda function on AWS that typically works when building the main branch. When opening the /capabilities url I get an internal error.

bemyak commented 1 year ago

Thanks for giving it a try, @sjb-citian !

I haven't tested the lambda build. Are there any logs or a stacktrace so I could investigate?

sjb-citian commented 1 year ago

@bemyak I rebuilt it this morning using the command specified in the read.me of the cmd/lambda folder and reconfigured my endpoint on AWS and it worked! Likely a build error on my part or I configured the endpoint incorrectly, either this branch definitely works with lambda.

ARolek commented 1 year ago

@sjb-citian thanks for being diligent on this contribution, and again apologies for the slow response on my end. Regarding caching, I think we should go with:

  1. Skip caching for maps that have custom parameters, printing a warning on startup

This seems the most natural in my mind. If the user wants some caching, we can suggest they out a cache in front of tegola, like a CDN. Your breakdown of the complications with caching dynamic tiles is very correct. It opens up a whole new world of complexities that will likely need to be solved, but I think are beyond the scope of this initial enhancement.

I think the cache detail is the only final outstanding detail. What I would like to do next is the following:

I'm excited to play with this feature. It has been long requested and again, I appreciate your diligence and patience. Your contribution is greatly appreciated!

bemyak commented 1 year ago

@gdey , @ARolek , I added cache bypassing and the warning for maps with configured parameters.

Is there anything else remaining unresolved?

bemyak commented 1 year ago

@ARolek, can I do something else here?

ARolek commented 1 year ago

@bemyak nope! Action on me to get the release notes and documentation updates for the current version polished out. If you would like to help on that front I can push up a branch that you can help on. I'm mainly trying to cut over to using mvt_postgis in our readmes.

ARolek commented 1 year ago

@bemyak can you rebase this branch against master please?

bemyak commented 1 year ago

@ARolek, done! I've just dropped two commits with the Go update and CI fix

ARolek commented 1 year ago

@bemyak alright, I just cut the v0.16.0 version of tegola. Going to run the CI on this PR and then I will merge it in. Thanks again for the commitment and patience on this PR. I would like to let this sit in pre-release state for a bit so we can let the community give it a try.

ARolek commented 1 year ago

PR now in master: https://github.com/go-spatial/tegola/commits/master. I'm going to open a discussion for anyone that needs help with the feature or has feedback. Onward!

ARolek commented 1 year ago

@bemyak when opening this discussion, I realized the configuration instructions for the feature are only in the PR. Could you send in some README documentation to help guide future users?

bemyak commented 1 year ago

@ARolek, thank you! I've opened a PR with the README update: https://github.com/go-spatial/tegola/pull/896