geopython / pycsw

pycsw is an OGC CSW server implementation written in Python. pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]. Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X). Please read the docs at https://pycsw.org/docs for more information.
https://pycsw.org
MIT License
206 stars 155 forks source link

pycsw does not decode spaces in the query part of GET requests correctly #465

Closed ricardogsilva closed 7 years ago

ricardogsilva commented 8 years ago

Description

Whenever a GET request includes a KVP parameter whose value includes spaces that have been urlencoded to '+', pycsw is not able to correctly unquote them.

According to this stackoverflow post (which in turn references other reliable sources), the recommended way for a client to quote an HTTP GET request's query part is to turn spaces into the '+' character. (Note that in the path part of the URL, the correct quoting would be to turn spaces into '%20'). Apparently, both the '+' and the '%20' percent encoding are supported in the query part, but '+' is recommended. Either way, a robust server should be able to unquote both.

Encoding spaces as '+' is indeed how clients such as python's requests module or httpie work.

However, pycsw does not unquote these '+' characters back to the space character, which means that when using such clients for querying a pycsw instance, requests are not parsed correctly.

The fix for this error is pretty simple, and consists in replacing the usage of the six.moves.urllib.parse.unquote function with six.moves.urllib.parse.unquote_plus. The latter is aware of this problematic space unquoting issue and does the right thing if the request is encoded with '+' or with '%20'.

Environment

# creating a docker container, to prevent messing up our own installation
docker run --rm -ti -p 8000:8000 ubuntu:16.04 /bin/bash   
apt-get update
apt-get install -y git libgeos-dev libxml2-dev libxslt1-dev libz-dev python-dev python-pip
git clone https://github.com/geopython/pycsw.git
cd pycsw
pip install -r requirements.txt -r requirements-standalone.txt -r requirements-dev.txt
pip install -e .
paver test
export PYCSW_CONFIG=${PWD}/tests/suites/cite/default.cfg
python pycsw/wsgi.py

# pycsw should be running now
import requests
from lxml import etree

url = "http://localhost:8000"
constraint = "AnyText like '%lorem%'"
payload = {
    "service": "CSW",
    "version": "2.0.2",
    "request": "GetRecords",
    "resulttype": "results",
    "elementsetname": "brief",
    "typenames": "csw:Record",
    "constraintlanguage": "CQL_TEXT",
    "constraint": constraint
}
response = requests.get(url, params=payload)
response_element = etree.fromstring(response.text.encode("utf-8"))
print(etree.tostring(response_element, pretty_print=True))

Pycsw's response is:

<ows:ExceptionReport xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:dc="http:/
/purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:gmd="http://www.isotc211.org/
2005/gmd" xmlns:gml="http://www.opengis.net/gml" xmlns:ows="http://www.opengis.net/ows" xmlns:xs="htt
p://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" language="en-US"
 version="1.2.0" xsi:schemaLocation="http://www.opengis.net/ows http://schemas.opengis.net/ows/1.0.0/
owsExceptionReport.xsd">
  <ows:Exception exceptionCode="InvalidParameterValue" locator="constraint">
<ows:ExceptionText>Invalid query: (sqlite3.OperationalError) no such column: like [SQL: u"SELECT 
count(*) AS count_1 \nFROM (SELECT records.identifier AS records_identifier, records.typename AS rec$rds_typename, records.schema AS records_schema, records.mdsource AS records_mdsource, records.insert$date AS records_insert_date, records.xml AS records_xml, records.anytext AS records_anytext, records$language AS records_language, records.type AS records_type, records.title AS records_title, records.$itle_alternate AS records_title_alternate, records.abstract AS records_abstract, records.keywords AS records_keywords, records.keywordstype AS records_keywordstype, records.parentidentifier AS records_$arentidentifier, records.relation AS records_relation, records.time_begin AS records_time_begin, rec$rds.time_end AS records_time_end, records.topicategory AS records_topicategory, records.resourcelang$age AS records_resourcelanguage, records.creator AS records_creator, records.publisher AS records_pu$lisher, records.contributor AS records_contributor, records.organization AS records_organization, re$ords.securityconstraints AS records_securityconstraints, records.accessconstraints AS records_access$onstraints, records.otherconstraints AS records_otherconstraints, records.date AS records_date, reco$ds.date_revision AS records_date_revision, records.date_creation AS records_date_creation, records.d$te_publication AS records_date_publication, records.date_modified AS records_date_modified, records.$ormat AS records_format, records.source AS records_source, records.crs AS records_crs, records.geode$code AS records_geodescode, records.denominator AS records_denominator, records.distancevalue AS rec$rds_distancevalue, records.distanceuom AS records_distanceuom, records.wkt_geometry AS records_wkt_g$ometry, records.servicetype AS records_servicetype, records.servicetypeversion AS records_servicetypeversion, records.operation AS records_operation, records.couplingtype AS records_couplingtype, record
s.operateson AS records_operateson, records.operatesonidentifier AS records_operatesonidentifier, rec
ords.operatesoname AS records_operatesoname, records.degree AS records_degree, records.classification
 AS records_classification, records.conditionapplyingtoaccessanduse AS records_conditionapplyingtoacc
essanduse, records.lineage AS records_lineage, records.responsiblepartyrole AS records_responsiblepar
tyrole, records.specificationtitle AS records_specificationtitle, records.specificationdate AS record
s_specificationdate, records.specificationdatetype AS records_specificationdatetype, records.links AS
 records_links \nFROM records \nWHERE AnyText+like+'%lorem%') AS anon_1"]</ows:ExceptionText>
  </ows:Exception>
</ows:ExceptionReport>

The error complains about an invalid query. The contents of the error (which probably should not be displayed back to the client, but that can be handled in another issue) show that the database is being fed a query that includes `WHERE AnyText+like+'%lorem%'. This means that the url unquoting is not being done correctly.

Additional Information

I've fixed this issue locally and am willing to send a PR with the fix, in a short while.

kalxas commented 8 years ago

Thank you @ricardogsilva for the report. The solution seems pretty straightforward.

tomkralidis commented 8 years ago

Thanks for the report / investigation @ricardogsilva. Looking forward to PR.

tomkralidis commented 7 years ago

Fixed in master and backported to 2.0 branch. Thanks @ricardogsilva!

oblakeobjet commented 7 years ago

Hi all,

Even after the last patch we are still experiencing problems with spaces in CQL queries.

Here are the symptoms.

Version 1.10.3:

curl "http://apps.ecmwf.int/csw/?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementSetName=full&resultType=results&constraintLanguage=CQL_TEXT&startposition=1&constraint=dc%3Asubject%20like%20%27%25Product%20family%25%27"

-> I get the expected list of products.

git clone master && git pull version

curl "http://85.204.96.249:8000/?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementSetName=full&resultType=results&constraintLanguage=CQL_TEXT&startposition=1&constraint=dc%3Asubject%20like%20%27%25Product%20family%25%27"

-> I get the "Invalid Filter syntax" error.

If I replace the space character (%20) in the single-quoted string of the LIKE statement with a wildcard character (%27) the latest version will return a valid response.

I would appreciate any suggestion.

Miha

tomkralidis commented 7 years ago

@oblakeobjet thanks for the info. I've filed a new issue at #483 given it is not related to @ricardogsilva's fixes here.

tomkralidis commented 7 years ago

cc @kalxas

@ricardogsilva note that during our 2.0.3 release today we encountered problems testing against OGC CITE CSW3 (CSW 2 worked) using the online testing engine.

When I tested against CITE in a local environment (see https://bpaste.net/show/4dcff963c88e), all CSW 3 tests passed. Which allowed us to move ahead with 2.0.3 but makes me wonder whether the online test is sending spaces differently than when testing on the command line. Needs verification.