geonetwork / core-geonetwork

GeoNetwork is a catalog application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives across the world.
http://geonetwork-opensource.org/
GNU General Public License v2.0
428 stars 489 forks source link

GeoNetwork Harvester - xsl process configuration #1873

Open josegar74 opened 7 years ago

josegar74 commented 7 years ago

The GeoNetwork harvester, allows to configure the xsl process in a text field, so the user has to write manually, what is at least a bit odd:

geonetwork-harvester-xsl

While in CSW harvester is displayed a list with the processes defined in xsl/conversion/import:

csw-harvester-xsl

Is there any reason for this difference? Anything that justifies that the user should enter the xslt manually in the GeoNetwork harvester?

For CSW harvester xsl processes see related to https://github.com/geonetwork/core-geonetwork/issues/1872

fxprunayre commented 7 years ago

Is there any reason for this difference? Anything that justifies that the user should enter the xslt manually in the GeoNetwork harvester?

Yes because processes can have parameters. See http://trac.osgeo.org/geonetwork/ticket/645 & https://github.com/geonetwork/core-geonetwork/commit/b8b835379bbf85be0cafecf3942e92dc5d51516c#diff-09b6641e0886848320885e38f9a44088R6

josegar74 commented 7 years ago

Sorry, but this seem quite inconsistent and prone to confusion as each harvester takes the xslt from different folder (although for CSW doesn't work really). Please check also https://github.com/geonetwork/core-geonetwork/issues/1872

In my opinion, this configuration in GeoNetwork harvester is pretty useless as I doubt any user is able to use the xslt unless they check the source code to know which processes exist and which parameters accept, what sounds unlikely.

fxprunayre commented 7 years ago

I doubt any user is able to use the xslt unless they check the source code to know which processes exist and which parameters accept, what sounds unlikely.

It looks like some docs was written on this https://taskman.eionet.europa.eu/issues/32 5 years ago but looks to be no longer available. Maybe process doc section should be updated http://geonetwork-opensource.org/manuals/trunk/eng/users/user-guide/workflow/batchupdate-xsl.html. BTW, I don't expect this to be used by general users but more by developers configuring advanced harvesting task which may require such filtering - This was initially developed for EEA to anonymize records during harvesting.

josegar74 commented 7 years ago

Ok, at least would be good to review if should work the same in all harvesters and do related changes.

For the time being I applied a fix to #1872 to work with the xslt processes from xsl/conversion/import as defined in the UI for this harvester.

Some items to check:

1) Should be used the xslt processes from xsl/conversion/import (CSW harvester) or from schemas process folder (GeoNetwork harvester)?

3) In case we want to use schemas process folder, maybe should be improved to suggest existing processes in the schemas.

3) Should be allowed xslt processes in other harvesters? Maybe doesn't make sense in some like OGC Harvester, but for others I think can be relevant.

etj commented 7 years ago

For point 3), IMHO we should leave the option to postprocess the metadata. For instance, I found a usecase where an harvesting from a WMS service needs some data that should be filled in by the XSL. Please note that WxS harvesters do allow for setting a "XSL transformation to apply", but it isn't applied (see also #1982)