ckan / ckanext-spatial

Geospatial extension for CKAN
http://docs.ckan.org/projects/ckanext-spatial
126 stars 193 forks source link

Cannot harvest ISO 19115 metadata hosted in pycsw #219

Open gpcimino opened 5 years ago

gpcimino commented 5 years ago

Hi all,

just install CKAN 2.8.2 and the last version from master branch of ckanext-harvest and ckanext-spatial.

My goal is to have CKAN harvest XML metadata ISO 19115 hosted by pycsw.

I used the command harvester run_test to test the first harvest (see the output below). The harvester was not able to get any metadata file. As matter of fact it generates a lot of "Empty record for GUID xxx" message. My guess is that the metadata exposed vi pycsw start with

<gmi:MI_Metadata>

while looks like the CKAN CSW harvester looks for

<gmd:MD_Metadata>

Is that correct? Any suggestions?

Thanks

(default) [root@myserver ckan]# /usr/lib/ckan/default/bin/paster --plugin=ckanext-harvest harvester run_test   -c /etc/ckan/default/development.ini 731de3d5-a98b-411e-875d-6408af1ff422
2019-04-10 17:09:11,895 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2019-04-10 17:09:11,905 DEBUG [ckanext.harvest.model] Harvest tables already exist
2019-04-10 17:09:11,937 DEBUG [ckanext.spatial.plugin] Setting up the spatial model
2019-04-10 17:09:11,946 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2019-04-10 17:09:11,953 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2019-04-10 17:09:12,252 DEBUG [ckanext.harvest.model] Harvest tables already exist
2019-04-10 17:09:12,284 DEBUG [ckanext.spatial.plugin] Setting up the spatial model
2019-04-10 17:09:12,287 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist

/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py:624: SAWarning: Can't resolve label reference 'error_count desc'; converting to text() (this warning may be suppressed after 10 occurrences)
  util.ellipses_string(element.element))
2019-04-10 17:09:12,558 INFO  [ckanext.harvest.logic.action.create] Harvest job create: {'source_id': u'731de3d5-a98b-411e-875d-6408af1ff422'}
2019-04-10 17:09:12,573 INFO  [ckanext.harvest.logic.action.create] Harvest job saved 327f4595-9c54-4797-832a-fc18eb55f43c
2019-04-10 17:09:12,579 INFO  [ckanext.harvest.logic.action.update] Send job to gather queue: {'id': u'327f4595-9c54-4797-832a-fc18eb55f43c'}
2019-04-10 17:09:12,623 INFO  [ckanext.harvest.logic.action.update] Sent job 327f4595-9c54-4797-832a-fc18eb55f43c to the gather queue
2019-04-10 17:09:12,641 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] CswHarvester gather_stage for job: <HarvestJob id=327f4595-9c54-4797-832a-fc18eb55f43c created=2019-04-10 15:09:12.571549 gather_started=2019-04-10 15:09:12.641448 gather_finished=None finished=None source_id=731de3d5-a98b-411e-875d-6408af1ff422 status=Running>
2019-04-10 17:09:12,680 DEBUG [ckanext.spatial.harvesters.csw.CSW.gather] Starting gathering for http://myserver:8000/pycsw
2019-04-10 17:09:12,680 INFO  [ckanext.spatial.lib.csw_client] Making CSW request: getrecords2 {'typenames': 'csw:Record', 'maxrecords': 10, 'sortby': <owslib.fes.SortBy object at 0x7fd7cf763250>, 'outputschema': 'http://www.isotc211.org/2005/gmd', 'cql': None, 'startposition': 0, 'esn': 'brief', 'constraints': []}
2019-04-10 17:09:12,749 INFO  [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier etopo180 from the CSW
2019-04-10 17:09:12,750 INFO  [ckanext.spatial.harvesters.csw.CSW.gather] Got identifier etopo360 from the CSW

/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py:2181: SAWarning: Usage of the 'related attribute set' operation is not currently supported within the execution stage of the flush process. Results may not be consistent.  Consider using alternative event listeners or connection-level operations instead.
  % method)
/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py:2181: SAWarning: Usage of the 'collection append' operation is not currently supported within the execution stage of the flush process. Results may not be consistent.  Consider using alternative event listeners or connection-level operations instead.
  % method)
/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py:2276: SAWarning: Attribute history events accumulated on 1 previously clean instances within inner-flush event handlers have been reset, and will not result in database updates. Consider using set_committed_value() within inner-flush event handlers to avoid this warning.
  % len_)

  2019-04-10 17:09:12,795 DEBUG [ckanext.spatial.harvesters.csw.CSW.fetch] CswHarvester fetch_stage for object: 6e837ddb-edae-42d9-801a-9bb2aa0595ff
2019-04-10 17:09:12,838 INFO  [ckanext.spatial.lib.csw_client] Making CSW request: getrecordbyid [u'etopo180'] {'esn': 'full', 'outputschema': 'http://www.isotc211.org/2005/gmd'}
2019-04-10 17:09:12,875 DEBUG [ckanext.harvest.model] Empty record for GUID etopo180

2019-04-10 17:09:13,423 DEBUG [ckanext.spatial.harvesters.csw.CSW.fetch] CswHarvester fetch_stage for object: 93fed1f4-039b-46a9-8bf7-bbaf804f103c
2019-04-10 17:09:13,488 INFO  [ckanext.spatial.lib.csw_client] Making CSW request: getrecordbyid [u'etopo360'] {'esn': 'full', 'outputschema': 'http://www.isotc211.org/2005/gmd'}
2019-04-10 17:09:13,524 DEBUG [ckanext.harvest.model] Empty record for GUID etopo360
20
2019-04-10 17:09:13,626 INFO  [ckanext.harvest.logic.action.update] Harvest job run: {}
2019-04-10 17:09:13,643 INFO  [ckanext.harvest.logic.action.update] Marking job as finished http://myserver:8000/pycsw 327f4595-9c54-4797-832a-fc18eb55f43c
2019-04-10 17:09:13,669 DEBUG [ckanext.harvest.logic.action.update] Updating search index for harvest source: myorg-csw
benjwadams commented 5 years ago

Related: https://github.com/ckan/ckanext-spatial/pull/210 , https://github.com/ckan/ckanext-spatial/issues/209 In short MI_Metadata isn't handled properly in the current master as of this writing. I'll get back around to seeing if I can track down why the automated tests are failing.

bonnland commented 4 years ago

There are many versions of ISO 19115; CKAN does not support all of them. Here is the version that I believe the harvester is written for:

https://service.ncddc.noaa.gov/rdn/www/metadata-standards/documents/MD-Metadata.pdf