dechristopher commented 2 years ago

Hi all. I recently stumbled upon a use-case for having query parameters present in URLs. Being able to configure Tegola's queries on the fly with optional URL query parameters for filtering down data can simplify configuration for large datasets that would otherwise have many layers defined manually.

Defining a query parameter is as simple as this:

[[maps.params]]
name = "param"
token = "!PARAM!"
default = "7"

This will configure a parameter named param with the SQL replacement token !PARAM!. The default key is optional and allows you to specify a default value for the parameter if none is provided. Without a default specified, the replacer will default to substituting an empty string.

The endpoint then becomes queryable at:

http://server:8080/maps/example/{z}/{x}/{y}.pbf?param=123

Currently, to prevent SQL injection, parameter values are limited to values matching the following regular expression. This is to limit control characters from being injected via query parameters:

[a-zA-Z0-9,._-]+

This obviously needs more work and I am open to feedback regarding security. I've also thought of limiting the scope of query parameters to integers only since most of the use cases I've found for this feature involve numeric ID matching in queries.

Any and all feedback is welcome. I hope others can find value in this contribution.

ARolek commented 2 years ago

@dechristopher awesome! This feature has been requested before so it's great you tackled it! I should be able to get a review in the next week.

cc @gdey @ear7h

dechristopher commented 2 years ago

@dechristopher awesome! This feature has been requested before so it's great you tackled it! I should be able to get a review in the next week.

cc @gdey @ear7h

Any chance to review yet?

gdey commented 2 years ago

I'll look at this Friday. @dechristopher do you know why the test is failing? Have not dug into the logs. But a quick scan is complaining about bbox. I know you are dynamically changing that in the SQL.

dechristopher commented 2 years ago

I'll look at this Friday. @dechristopher do you know why the test is failing? Have not dug into the logs. But a quick scan is complaining about bbox. I know you are dynamically changing that in the SQL.

Unsure why the tests are failing on the runner. They run cleanly on my machine. I'll investigate and see if I can come up with a fix.

ear7h commented 2 years ago

This is a really cool feature! Thanks for getting it started!

WRT to input validation I think the best approach is to use the "placeholder" mechanism offered by pgx (it's described briefly in pgx.Query). I think the biggest difficulty with this is getting the numbering right, and keeping track of it. I've come up with a function that could do that, but I'm not sure how it would fit with that part of the code base:

https://play.golang.org/p/Dk6SWSf2f_N

This is also missing a few things like case insensitivity and special handling of the builtin tokens. The later could be implemented by writing to the queryArgs map after writing the user provided args (so any colliding user provided args get overwritten).

ARolek commented 2 years ago

Another consideration with this PR is going to be the tile cache. If the query params can produce different features based on the query strings, we're going to need to revisit the tile cache. Alternatively, we can say if filter params are used, then no caching is allowed right now.

kszafran commented 2 years ago

Thanks for this PR! We need something similar and I thought I'll have to contribute, but I'm glad someone has already started the work! Our use case is a bit different, since we use tegola like a library, and set up our own routing etc. And the parameters we need to pass will not be coming from query parameters but from a JWT token.

Few things I'd suggest:

Consider an alternative/additional method of passing these parameters. In my case I could cram them into a Request.URL with some middleware, but perhaps there's a better way (e.g. not adding "params" to the context if they are already there or supporting some additional value, like "external_params").
The "SQL injection" protection is pretty weak and very limiting on what values can be passed. The only correct way to defend again SQLi is to use parameters, like $1, $2 etc. and pass user's input as query arguments and not inside the SQL string. To make this easier and more flexible, we could use ? as placeholders and only replace them with Postgres-style numbered placeholders$1 before executing the query. That how e.g. sqrl works. This way the order of parameters in the configurable layer SQL will not matter.
Tokens like !BBOX! are not individual values, but whole expressions. I think we should treat parameters the same way. For example:
```
[[maps.params]]
name = "param"
token = "!PARAM!"
query = "some_field @> ARRAY[?]"
default = "7"
```
This way more complicated expressions could be shared by multiple layers. And we could make query be ? by default, so that if you don't specify query it will work like the currently proposed version.

@ARolek Let me know if I can help somehow. I'd like to see this PR improved and merged.

dechristopher commented 2 years ago

@kszafran this is all great feedback and I'm happy to see others interested in getting this feature wrapped up and merged. Admittedly, I've not had enough time to sit down and address all of the review with how busy my work schedule has been. If you'd like to see this through with your improvements, then by all means.

Also, I wholeheartedly agree with your third point. In my testing, I was required to modify the queries a bit to support the params and with this, I think it'd be easier to generalize and pass tests. Something that needs to be taken into account is the batch of queries that Tegola performs at startup to determine the geometry type from the table in each configured layer. With manually specified types this isn't a problem, but with the introspection queries I've seen failures if the queries aren't built in a way that they can be valid without having an included parameter passed in. This is why I've added the default field. There's definitely a better way to handle this via treating params as expressions.

ARolek commented 2 years ago

@dechristopher and @kszafran apologies for the slow response. I missed the comments you two dropped in here. I'm interested in bringing this feature to life as well, but I posted some concerns with the current implementation. It might be easier to work on the design in real-time via slack. We can keep the discussion here as well, but I don't think we should be using context.Context to pass around the filters. I think we should enhance the Tilter interface:

// Tiler is a Layers that allows one to encode features in that layer
type Tiler interface {
    Layerer

    // TileFeature will stream decoded features to the callback function fn
    // if fn returns ErrCanceled, the TileFeatures method should stop processing
    TileFeatures(ctx context.Context, layer string, t Tile, fn func(f *Feature) error) error
}

We can add another parameter to TileFeatures for filters. I'm not sure what the design for Filters should be yet, but I think this thread already has some ideas brewing. Ideally we can figure out a design that allows us to filter not only SQL providers but other future providers too (i.e. GeoJSON). We don't need to implement the filters now, but I would like to consider them.

bemyak commented 2 years ago

Hi!

I'm starting working to push this feature forward. The main issue currently is the SQL injection vulnerability. We definitely need to use prepared statements, i.e., replace all user-defined parameters with $1, $2 and so on, and pass the values as arguments.

Unfortunately, this means that we need to have type information available for parameters.

So far the idea is to have parameters configured like this:

[[maps.params]]
name = "param2"
token = "!PARAM2!"
# only very simply types are suppored: `int`, `string`, `float`, `bool`
type = "int"
# `?` will be replaced with the actual value
# if `sql` is not specified, it is considered to be `?`
sql = "AND ANSWER = ?"
# if the parameter value is missing from the HTTP query, this value will be used
# `default_value` is an optional parameter. If it is not defined, the parameter is considered as required and cliet will get 400 error
default_value = "42"
# instead of `default_value`, `default_sql` can be defined, e.g. to omit param query entirely:
#default_sql = ""
# defining both `default_value` and `default_sql` is forbidden

I will be really glad to hear any feedback or thoughts on this.

kszafran commented 2 years ago

I'm working with @bemyak on preparing the spec for this. Another approach that I would propose is this:

[[maps.params]]
name = "param2"
token = "!PARAM2!"
type = "int"
# if "sql" is not specified, it is `?` by default, so just the param value and not a whole expression
sql = "AND answer = ?"
# if the parameter is not specified by the caller, the "default" value will be used instead
# if "default" is not defined, the "!PARAM2!" token will be replaced with an empty string instead of "sql"
default = "42"
# "required" could be used in cases where we don't want to specify a "default" value, but want to make the parameter mandatory
required = "true"

The main difference compared to @bemyak proposal is how we behave when there's no default and a caller does not specify the parameter. There are a few different cases that we would like to handle:

parameter is required
parameter is optional and we use a default value if it's not specified
parameter is optional and we skip the whole condition if it's not specified

Those can be achieved with both designs. I just think that having only default plus replacing the token with an empty string would be simpler than having both default_value and default_sql`. Here's how you could achieve the three use cases:

1:

sql = "SELECT ST_AsMVTGeom(geometry, !BBOX!) ... AND answer = !ANSWER!"
...
[[maps.params]]
name = "answer"
token = "!ANSWER!"
type = "int"
required = "true"

2:

sql = "SELECT ST_AsMVTGeom(geometry, !BBOX!) ... AND answer = !ANSWER!"
...
[[maps.params]]
name = "answer"
token = "!ANSWER!"
type = "int"
default = "42"

3:

sql = "SELECT ST_AsMVTGeom(geometry, !BBOX!) ... !ANSWER!"
...
[[maps.params]]
name = "answer"
token = "!ANSWER!"
type = "int"
sql = "AND answer = ?"

ARolek commented 2 years ago

@bemyak and @kszafran apologies for the slow response and thanks for the effort on this issue.

Additional context to this discussion:

The original issue around filtering a layer dynamically
A PR that was submitted which has review comments: I recall the main issue I had with this PR was the abuse of context.Context. I think we need to figure out a more defined way to pass this information around rather than stuffing it into context. I think enhancing the TileProvider interface is the way to go.

Additional considerations:

When dynamic tile filtering is implemented, we need to think about how the tile cache is supposed to work. Currently the tile cache is either enabled or disabled. This is not configured on a per map basis. It seems to me that enabling tile filtering will require the tile cache to be off. If this is the case, I think we need to investigate if we want to support cache configs on a per map basis so we can allow for some maps to leverage the cache but others to have dynamic filtering.
SQL injections are a very real problem with this feature. I'm trying to figure out where the responsibility should lie for sanitizing the input. Tegola could be responsible for this as you're proposing. Alternatively we could use Postgres functions and pass on the responsibility to the database function. This might be more prone to error though as it's pushing the security responsibility to the implementer, but it would simplify the implementation in tegola. We might just be able to get away with a "query string whitelist" in the config that would then look for query strings and then turn them into tokens. Here's an example of calling a database function from tegola:

SELECT * FROM my_tile_func(!ZOOM!, !BBOX!, !FILTER_1!, !FILTER_2!)

I have used this pattern in tegola to push complex query logic to the database and it's quite nice.

Let's chat in real time:

This feature is one I would like to get right. I see the value, and I'm happy to support the effort. I do think it deserves some strong vetting though as the feature can be prone to abuse. Maybe we should try to chat through some of this in real time. You can find us in a couple of Slack channels:

https://slack.go-spatial.org/
The #go-spatial channel in Gopher Slack: https://gophers.slack.com

kszafran commented 2 years ago

@ARolek Thanks for the comments! @bemyak is actively working on a PR. It's almost ready and we should soon be able to create a PR to the upstream. If you want to take a look already, check out https://github.com/SharperShape/tegola/pull/6.

How about this - once we're ready, we'll create a PR to the upstream, and then we could have a call to discuss it (e.g. on Google Meet)? Slack is fine, too. Our implementation solves SQL injection by using query arguments instead of putting parameter values into the query string itself. For now, caching is disabled for parameterized maps, but I imagine that could be improved in the future.

ARolek commented 2 years ago

How about this - once we're ready, we'll create a PR to the upstream, and then we could have a call to discuss it (e.g. on Google Meet)?

Love it. Once we get there we can see who can all join the discussion. Thanks for working on this contribution.

gdey commented 1 year ago

go-spatial / tegola

Feature: adds configurable query parameters to tile endpoints #795

Additional context to this discussion:

Additional considerations:

Let's chat in real time:

867 supersedes this PR.