Closed ricardogsilva closed 7 years ago
Thank you @ricardogsilva for the report. The solution seems pretty straightforward.
Thanks for the report / investigation @ricardogsilva. Looking forward to PR.
Fixed in master and backported to 2.0 branch. Thanks @ricardogsilva!
Hi all,
Even after the last patch we are still experiencing problems with spaces in CQL queries.
Here are the symptoms.
Version 1.10.3:
curl "http://apps.ecmwf.int/csw/?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementSetName=full&resultType=results&constraintLanguage=CQL_TEXT&startposition=1&constraint=dc%3Asubject%20like%20%27%25Product%20family%25%27"
-> I get the expected list of products.
git clone master && git pull version
curl "http://85.204.96.249:8000/?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementSetName=full&resultType=results&constraintLanguage=CQL_TEXT&startposition=1&constraint=dc%3Asubject%20like%20%27%25Product%20family%25%27"
-> I get the "Invalid Filter syntax" error.
If I replace the space character (%20) in the single-quoted string of the LIKE statement with a wildcard character (%27) the latest version will return a valid response.
I would appreciate any suggestion.
Miha
@oblakeobjet thanks for the info. I've filed a new issue at #483 given it is not related to @ricardogsilva's fixes here.
cc @kalxas
@ricardogsilva note that during our 2.0.3 release today we encountered problems testing against OGC CITE CSW3 (CSW 2 worked) using the online testing engine.
When I tested against CITE in a local environment (see https://bpaste.net/show/4dcff963c88e), all CSW 3 tests passed. Which allowed us to move ahead with 2.0.3 but makes me wonder whether the online test is sending spaces differently than when testing on the command line. Needs verification.
Description
Whenever a GET request includes a KVP parameter whose value includes spaces that have been urlencoded to '+', pycsw is not able to correctly unquote them.
According to this stackoverflow post (which in turn references other reliable sources), the recommended way for a client to quote an HTTP GET request's query part is to turn spaces into the '+' character. (Note that in the path part of the URL, the correct quoting would be to turn spaces into '%20'). Apparently, both the '+' and the '%20' percent encoding are supported in the query part, but '+' is recommended. Either way, a robust server should be able to unquote both.
Encoding spaces as '+' is indeed how clients such as python's requests module or httpie work.
However, pycsw does not unquote these '+' characters back to the space character, which means that when using such clients for querying a pycsw instance, requests are not parsed correctly.
The fix for this error is pretty simple, and consists in replacing the usage of the
six.moves.urllib.parse.unquote
function withsix.moves.urllib.parse.unquote_plus
. The latter is aware of this problematic space unquoting issue and does the right thing if the request is encoded with '+' or with '%20'.Environment
Steps to Reproduce
pycsw/tests/suites/cite
as sample data and configuration:AnyText like '%lorem%'
. This filter contains spaces, so it will be incorrectly parsed by pycsw:Pycsw's response is:
The error complains about an invalid query. The contents of the error (which probably should not be displayed back to the client, but that can be handled in another issue) show that the database is being fed a query that includes `WHERE AnyText+like+'%lorem%'. This means that the url unquoting is not being done correctly.
Additional Information
I've fixed this issue locally and am willing to send a PR with the fix, in a short while.