Materials-Consortia / optimade-python-tools

Tools for implementing and consuming OPTIMADE APIs in Python
https://www.optimade.org/optimade-python-tools/
MIT License
71 stars 44 forks source link

Add server check for typos in query parameters #1120

Closed JPBergsma closed 2 years ago

JPBergsma commented 2 years ago

I have already had several times that I made a type in the URL, and it took me quite some time to figure out what I did wrong. See for example issue #1076 Some other wrong URLs that I have "tried": https://optimade.herokuapp.com/v1/structures/filter=nelements=2 https://optimade.herokuapp.com/v1/structures?filter=nelements=2&response_field=cartesian_site_positions Neither of these URLs will give you an error, but the results are definitively not what you want. Therefore, I think it would be good to add stricter checking of the input URLs.

CasperWA commented 2 years ago

How exactly do you intend to achieve this checking? I think the main point of the API is to facilitate machine-to-machine interaction, where typos would be a thing of a badly written client (or similar). Furthermore, I think we have to draw the line of "hand-holding" users at some point?

However, if it's simply a check for the tests within our code, I guess that makes sense - but otherwise, I think we should respect the specification, which states to ignore query parameters that are not OPTIMADE-specific. If you really want, we could return a nicer 404 for the misspelled paths, but it should already be included in the error response's metadata.

In other words, I don't think this is an issue of the package, but rather of the users, and not something we should spend energy on - but I'm open for suggestions on how you intend to improve error responses.

ml-evs commented 2 years ago

How exactly do you intend to achieve this checking? I think the main point of the API is to facilitate machine-to-machine interaction, where typos would be a thing of a badly written client (or similar). Furthermore, I think we have to draw the line of "hand-holding" users at some point?

I don't agree that adding a warning (or an error) on invalid query parameters is hand-holding. We should be able to do this easily with a validator on the query params FastAPI model.

However, if it's simply a check for the tests within our code, I guess that makes sense - but otherwise, I think we should respect the specification, which states to ignore query parameters that are not OPTIMADE-specific. If you really want, we could return a nicer 404 for the misspelled paths, but it should already be included in the error response's metadata.

Not sure this is true? This actually seems missing from the spec. The only mention I can find is specifically for the single entry listing endpoint:

[5.4.1 Single Entry URL Query Parameters] The client MAY provide a set of additional URL query parameters for this endpoint type. URL query parameters not recognized MUST be ignored.

Elsewhere it says:

Additional OPTIONAL URL query parameters not described above are not considered to be part of this standard, and are instead considered to be "custom URL query parameters". These custom URL query parameters MUST be of the format "". These names adhere to the requirements on implementation-specific query parameters of JSON API v1.0 since the database-provider-specific prefixes contain at least two underscores (a LOW LINE character '_').

I think we can emit a warning and still be spec compliant. The spec could even be strengthened to error in this scenario, IMO (so the handling of custom fields and params is unified).

In other words, I don't think this is an issue of the package, but rather of the users, and not something we should spend energy on - but I'm open for suggestions on how you intend to improve error responses.

I think I simply disagree! Maybe I am biased by having to run the API tutorials, many people write code that calls the APIs from scripts etc.

JPBergsma commented 2 years ago

According to the JSON API specification using the "filer" query parameter, as mentioned in #1076, MUST result in a 400 BAD REQUEST message, as any custom query parameter should have at least one character that is not a lower case alphabetical character (a-z).

I'll try to create a quick check over the weekend. I think I can extract the query parameter keys from the request and then check whether they are in a list of supported query parameters.

Do you think the following options are good behaviour for unknown parameters?

If the parameter does not have a database_specific prefix or the prefix of the current database, throw a 400 Bad Request error. If the prefix is unknown a warning will be given. And If the parameter has a known database prefix, it will be ignored.

ml-evs commented 2 years ago

If the parameter does not have a database_specific prefix or the prefix of the current database, throw a 400 Bad Request error. If the prefix is unknown a warning will be given. And If the parameter has a known database prefix, it will be ignored.

I wouldn't worry about the unknown prefix part (if it has any prefix just ignore it), but otherwise this sounds good.

I had a quick play over lunch to do this via the query param classes themselves, but the way FastAPI does dependency injection, this was a bit tricky. Instead you can go via the raw request and compare with the fields of the query param class (might be a bit tricky). Surprised there isn't a supported way of doing this easily really.

CasperWA commented 2 years ago

I think I simply disagree! Maybe I am biased by having to run the API tutorials, many people write code that calls the APIs from scripts etc.

Fair enough :)

I just think this is more of a hint that a proper "nice" client is needed. Either a CLI or a simple Python class client.

We should most likely not start erroring if a query parameter given is badly formed, since it may be that another sub-server is attached to the current FastAPI server with some middleware that uses this supposedly badly formed query parameter or similar? Hmm, however, a warning could be all right, although that may end up being a noisy, incorrect warning in the same case that would then have to be ignored. Versus other warnings, which might shouldn't be ignored - confusing users.

I'd just simply ignore it and try to focus more on making nicer clients that provides users with better tools to query OPTIMADE APIs with less risk of typing errors in the URLs. But again, if the fix is simple - and non-disturbing for advanced setups - I guess it's fine. This is, after all, just an example server, not something that should be used 1:1 (even though it is and it'd be nice if it could stay like that for all sorts of weird use cases) :)

JPBergsma commented 2 years ago

I just think this is more of a hint that a proper "nice" client is needed. Either a CLI or a simple Python class client.

Good point, if there is a good library for creating OPTIMADE queries, far fewer people would need to write their own client software to do this.

Do you know if there are databases that use database specific query parameters that are not declared in the SingleEntryQueryParams or the EntryListingQueryParams classes? And could it be a problem to define them there ?

It would be elegant to have all the query params declared in one place. (Now we are still handling the api_hint elsewhere)

ml-evs commented 2 years ago

I just think this is more of a hint that a proper "nice" client is needed. Either a CLI or a simple Python class client.

Not going to disagree there :)

We should most likely not start erroring if a query parameter given is badly formed, since it may be that another sub-server is attached to the current FastAPI server with some middleware that uses this supposedly badly formed query parameter or similar? Hmm, however, a warning could be all right, although that may end up being a noisy, incorrect warning in the same case that would then have to be ignored. Versus other warnings, which might shouldn't be ignored - confusing users.

This is a good point (especially re:NOMAD). In my review of #1122 I suggested that this feature is optional and off by default.

I'd just simply ignore it and try to focus more on making nicer clients that provides users with better tools to query OPTIMADE APIs with less risk of typing errors in the URLs.

Sure, I'm hoping to find time to add one to this package before the next workshop, but that time is quickly dissipating... I am involved with quite a few use cases where people are directly querying OPTIMADE URLs themselves (as code developers) where using even a Python API might not be possible. Examples:

I'm thinking the best idea would be a very minimal client library with only requests or httpx as a dependency, separate to this package (or namespaced), and then this package can provide additional routines for deserializing/validating the JSON responses.

CasperWA commented 2 years ago

I'm thinking the best idea would be a very minimal client library with only requests or httpx as a dependency, separate to this package (or namespaced), and then this package can provide additional routines for deserializing/validating the JSON responses.

This is what I've been thinking as well.

I've already managed to create a couple of different clients (and other related software in the same definition-sphere). But a simple requests/httpx-based Python class client would be great - it should in theory utilize the pydantic models, but other than that it shouldn't have any other dependencies.

Another alternative is to use the Typer framework for a CLI or similar, which should be an "easy" way to utilize the pydantic models for a CLI?

Personally I'd prefer the simple Python class client for now, though.

ml-evs commented 2 years ago

I'll see what I can come up with in May before the workshop. (Are you coming this year?)