Closed pmauduit closed 2 years ago
sorry for lagging a bit on testing this, but after deploying & testing the PR in a browser visiting /geonetwork/srv/fre/rss.search
i got this :
2021-09-23 12:03:06.574 ERROR 36754 --- [nio-8580-exec-2] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is java.lang.Exception: Failed to connect to index at URL http://localhost:9200/gn-records/_search?. No response processor configured for 'text/html'. Use one of rss, application/rss+xml.] with root cause
java.lang.UnsupportedOperationException: No response processor configured for 'text/html'. Use one of rss, application/rss+xml.
as a comparison point, gn3.8 doesnt blow when queried from a browser, and proposes the rss file for download (ex https://ids.craig.fr/geocat/srv/fre/rss.search)
force-setting the mimetype in curl seems to work and return some xml content that might be valid rss:
curl -H 'Accept: rss' https://georchestra.dev.craig.fr/geonetwork/srv/fre/rss.search
alas, our target usecase is for the rss to be consumed by the agregator module of drupal, which seems to set no Accept
header at all (or a wrong one..), if i test the rss feed in the drupal module i get this in the service log:
java.lang.UnsupportedOperationException: No response processor configured for 'null'. Use one of rss, application/rss+xml.
the ua set by the rss client is "Drupal/8.9.18 (+https://www.drupal.org/) GuzzleHttp/6.5.5 curl/7.64.0 PHP/7.3.29-1~deb10u1"
maybe the apache config mumbojumbo should be configured to also send an Accept
header instead of/in addition to the Content-Type
header ? i've tried with both headers set in /var/www/georchestra/conf/gn-cloud-searching.conf
and that didnt work... or maybe it should be in the spring config to accept anything for the Accept/Content-Type
headers and returl rss content ?
defaultMimeType
in /etc/georchestra/geonetwork/microservices/searching/application.yml
isnt used either ?
curl -H 'Accept: rss' https://georchestra.dev.craig.fr/geonetwork/srv/fre/rss.search
you need the '?f=rss' param / query string in the url, the accept header should not be necessary
curl -H 'Accept: rss' https://georchestra.dev.craig.fr/geonetwork/srv/fre/rss.search
you need the '?f=rss' param / query string in the url, the accept header should not be necessary
ah thank ! That wasnt obvious :)
https://FQDN/geonetwork/srv/fre/rss.search?f=rss
is properly rendered by drupal aggregator, which so far only renders title/description/pubDate
in our usecase, cf https://www.craig.fr/aggregator/sources/1. So our usecase seems covered, afaict.
now, for the content of the rss itself (yes, i'm trying to think of usecases for the others) .. with this service we have (for a sample MD generated by datafeeder):
<item>
<title>Mes photos RTGE</title>
<link>https://georchestra.dev.craig.fr/geonetwork/srv/metadata/69c1cd43-45b7-4b27-b0e8-02b978ef1764</link>
<description>Ceci est un resume du dataset</description>
<author>psc+testadmin@georchestra.org</author>
<guid isPermaLink="false">69c1cd43-45b7-4b27-b0e8-02b978ef1764</guid>
<pubDate>Wed, 12 May 2021 00:00:00 GMT</pubDate>
</item>
previously with gn3.8 here's all the content that was returned for a fully populated MD:
<item>
<title>
Orthophotographie infrarouge - Département de l'Isère - PVA 2018
</title>
<link>
https://ids.craig.fr/geocat/srv/metadata/27c6a914-954c-4b00-a6d7-8d03f10d399c
</link>
<link href="http://www.craig.fr" type="text/html" rel="alternate" title="Site internet du CRAIG"/>
<link href="http://wms.craig.fr/ortho?" type="application/vnd.ogc.wms_xml" rel="alternate" title="Orthophotographie IRC 25cm 2018"/>
<category>Geographic metadata catalog</category>
<description>
<p><a href="https://ids.craig.fr/geocat/srv/metadata/27c6a914-954c-4b00-a6d7-8d03f10d399c"><img src="https://ids.craig.fr/geocat/srv/api/records/27c6a914-954c-4b00-a6d7-8d03f10d399c/attachments/vigete_ortho_irc38.jpg" align="left" alt="" border="0" width="100"/></a>Le produit "Orthophotographie infrarouge - Département de l'Isère" est une orthophotographie numérique en infrarouge, rectifiées dans la projection associée au système géodésique RGF93. La résolution au sol est de 0,25 par pixel, la précision planimétrique est de 0,50m et les dévers sont < à 34%. La longueur d'onde IR est comprise entre 650 et 960 nm. L'image IRC est composée de canal IR (650 - 960 nm) + Rouge (580 - 700 nm) et Vert (480 - 640nm). es prises de vue aérienne ont été réalisées entre le 7 juillet 2018 et le 28 août 2018. La caméra utilisée est l’une des caméras IGN dites « V2 huit têtes ». La taille des images est d’environ 14000X10400 pixels. La focale utilisée pour les prises de vues départementales est la focale 125 mm.<br/></p><br clear="all"/>
</description>
<pubDate>07 Dec 2020 09:02:29 EST</pubDate>
<guid isPermaLink="true">
https://ids.craig.fr/geocat/srv/metadata/27c6a914-954c-4b00-a6d7-8d03f10d399c
</guid>
<media:content url="https://ids.craig.fr/geocat/srv/api/records/27c6a914-954c-4b00-a6d7-8d03f10d399c/attachments/vigete_ortho_irc38.jpg"/>
<!--
Bounding box in georss GML format (http://georss.org)
-->
<georss:where>
<gml:Envelope>
<gml:lowerCorner>44.6958696017857 4.74204035901312</gml:lowerCorner>
<gml:upperCorner>45.8833927311594 6.35930313759244</gml:upperCorner>
</gml:Envelope>
</georss:where>
</item>
from all those items, i think the additional links from the MD and the md envelope could be valuable informations ? if they can be easily fetched from elasticsearch...
other than that the PR integrating this in the playbook looks fine (minor the comments i already did about the template and the task name)
rebase needed so that it can be merged ?
rebase needed so that it can be merged ?
I'd expect git to figure out that the other branch has been merged, but I can also rebase, indeed.
done
Merge ?
iirc i had comments but will fix them in a followup commit
Note: the other microservice (ogc-api-records) also provides a RSS output (if configured so).
Note: the other microservice (ogc-api-records) also provides a RSS output (if configured so).
ugh. so much for simplification :)
Fun fact discovered while integrating this behind nginx, even if the right accept/content-type headers are set to application/rss+xml
by nginx in the query sent to the service, it returns:
Content-Type: application/json;charset=UTF-8
while the returned content is actually XML. I'm pretty sure some clients will choke on that...
edit: bah, disregard, the Header Set
stanzas in the apache config is to set headers in the reply
while the returned content is actually XML. I'm pretty sure some clients will choke on that...
cc @fgravin for upstream report / change. Thanks !
This service provides a custom endpoint to get a georss representation of the GeoNetwork index in Elasticsearch, as this is not provided anymore with the v4 of GeoNetwork.
Note: this PR has been created on top of the bullseye branch and has been tested mainly under this version of debian.