geopython / pycsw

pycsw is an OGC CSW server implementation written in Python. pycsw fully implements the OpenGIS Catalogue Service Implementation Specification [Catalogue Service for the Web]. Initial development started in 2010 (more formally announced in 2011). The project is certified OGC Compliant, and is an OGC Reference Implementation. pycsw allows for the publishing and discovery of geospatial metadata via numerous APIs (CSW 2/CSW 3, OpenSearch, OAI-PMH, SRU). Existing repositories of geospatial metadata can also be exposed, providing a standards-based metadata and catalogue component of spatial data infrastructures. pycsw is Open Source, released under an MIT license, and runs on all major platforms (Windows, Linux, Mac OS X). Please read the docs at https://pycsw.org/docs for more information.
https://pycsw.org
MIT License
203 stars 155 forks source link

SOS 1.0.0 harvesting fails #215

Closed tomkralidis closed 10 years ago

tomkralidis commented 10 years ago
$ curl -X POST -d @tests/suites/harvesting/post/Harvest-sos100.xml http://localhost:8000/

<!-- pycsw 1.8.0-beta1 -->
<ows:ExceptionReport xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:inspire_common="http://inspire.ec.europa.eu/schemas/common/1.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dct="http://purl.org/dc/terms/" xmlns:ows="http://www.opengis.net/ows" xmlns:apiso="http://www.opengis.net/cat/csw/apiso/1.0" xmlns:gml="http://www.opengis.net/gml" xmlns:dif="http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gmd="http://www.isotc211.org/2005/gmd" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:srv="http://www.isotc211.org/2005/srv" xmlns:ogc="http://www.opengis.net/ogc" xmlns:fgdc="http://www.opengis.net/cat/csw/csdgm" xmlns:inspire_ds="http://inspire.ec.europa.eu/schemas/inspire_ds/1.0" xmlns:csw="http://www.opengis.net/cat/csw/2.0.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:os="http://a9.com/-/spec/opensearch/1.1/" xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope" xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9" language="en-US" version="1.2.0" xsi:schemaLocation="http://www.opengis.net/ows http://schemas.opengis.net/ows/1.0.0/owsExceptionReport.xsd"><ows:Exception exceptionCode="NoApplicableCode" locator="source"><ows:ExceptionText>Harvest failed: record parsing failed: &lt;!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"&gt;
&lt;html&gt;&lt;head&gt;
&lt;title&gt;400 Bad Request&lt;/title&gt;
&lt;/head&gt;&lt;body&gt;
&lt;h1&gt;Bad Request&lt;/h1&gt;
&lt;p&gt;Your browser sent a request that this server could not understand.&lt;br /&gt;
&lt;/p&gt;
&lt;hr&gt;
&lt;address&gt;Apache/2.2.0 (Fedora) Server at tabs2.gerg.tamu.edu Port 80&lt;/address&gt;
&lt;/body&gt;&lt;/html&gt;
</ows:ExceptionText>
tomkralidis commented 10 years ago

I was able to isolate the error enough to figure out that this is an HTTP header issue on the SOS server's part.

The HTTP Accept header, when not specified by the client, means that the client will accept anything. This is not sent by urllib2.urlopen by default, hence the error above. The SOS server seems to require Accept being explicitly passed, which appears to be incorrect as per the HTTP spec. FYI requests.get does pass "Accept: \*/\*", but this doesn't make this a downfall of the current approach.

Provider notified. In the meantime, SOS tests changed in 0d2f948616470257644927df90b28a498cc204f0.