Closed ogsletcax closed 2 years ago
Thanks for this Étienne. Keep them coming as you find more. I have updated the issue description with some headers as I was finding it hard to navigate.
1. Error when indexing XML file contents is addressed by cioos-siooc/ckan#63, specifically, the _get_xml_url_content
function located here improves the message output and changes the output to a warning in the log rather than an error. messages are also added to the harvester log so that messages will be visible in the GUI when reviewing harvester output. Messages may still need more detail. have a look and let me know what you think.
2. String encoding debug messages These messages are generated on lines 1370 and 1284 of /ckan/contrib/docker/src/ckanext-spatial/ckanext/spatial/model/harvested_metadata.py
Could try adding some context to the messages such as which field is being processed or if the conversion succeeded. These are debugging statements so the lack of an error message following them indicates it worked but I agree it could be a lot clearer.
3. Remote Organization Not Found This one is a bit odd. The relevant line from the code is in basy.py Line 326. it should be taking the first non-null entry from [dataset owner org, responsible organization[0], metadata point of contact[0]] but it looks like the dataset owner is 'None'. I suspect there is an issue with the harvester config.
info messages could be improved to better indicate what is going on.
4. Debug Resource Format Messages Could likely remove the log message from plugin.py#L141 it's not needed.
5. Keyword Error I wonder if this error is due to old code being used. The current master should not generate this exception as it should be handled by json_load function. plygin.py L385. Probably we should dump the whole exception in this case so that we can see the stack trace. The error message could also be improved to indicate that tag's will not be indexed as the keyword's field failed to parse.
I think all these issues have been addressed now
Some log messages are misleading or not self explanatory:
1. Error when indexing XML file contents
tracking update script: 2021-02-26 20:23:15,616 WARNI [ckanext.cioos_theme.plugin] Unable to find harvest object "none" referenced by dataset "a763f657-c3fe-4ddf-b878-f8a2c8c10579". Trying xml url 2021-02-26 20:23:15,992 ERROR [ckanext.cioos_theme.plugin] XML string is invalid. not well-formed (invalid token): line 1, column 26
search index rebuild script: same as above
2. String encoding debug messages
harvest metadata files: 2021-02-26 20:34:45,713 DEBUG [ckanext.spatial.model.harvested_metadata] Could not convert datetime value 2021-01-29 to UTC: time data '2021-01-29' does not match format '%Y-%m-%dT%H:%M:%S' 2021-02-26 20:34:45,713 DEBUG [ckanext.spatial.model.harvested_metadata] Failed to decode latin1 encodid string "'Temp\xe9rature de surface'" as utf8, trying encoding as utf8 2021-02-26 20:34:45,714 DEBUG [ckanext.spatial.model.harvested_metadata] Failed to decode latin1 encodid string "'\xc9tat de la mer'" as utf8, trying encoding as utf8 2021-02-26 20:34:45,714 DEBUG [ckanext.spatial.model.harvested_metadata] Failed to decode latin1 encodid string ""Densit\xe9 d'eau de mer"" as utf8, trying encoding as utf8 2021-02-26 20:34:45,714 DEBUG [ckanext.spatial.model.harvested_metadata] Failed to decode latin1 encodid string "'Oxyg\xe8ne'" as utf8, trying encoding as utf8
2021-02-26 20:34:45,714 DEBUG [ckanext.spatial.model.harvested_metadata] Could not convert datetime value to UTC: time data '' does not match format '%Y-%m-%dT%H:%M:%S' 2021-02-26 20:34:45,714 DEBUG [ckanext.spatial.model.harvested_metadata] Could not convert datetime value to UTC: time data '' does not match format '%Y-%m-%dT%H:%M:%S'
3. Remote Organization Not Found
2021-02-26 20:34:45,887 INFO [ckanext.spatial.harvesters.base] Found remote orginization options of: 'None', 'asdasd', 'asdasd' 2021-02-26 20:34:45,887 INFO [ckanext.spatial.harvesters.base] Using 'asdasd' for remote orginization 2021-02-26 20:34:45,892 INFO [ckanext.spatial.harvesters.base] Organization asdasd is not available
4. Debug Resource Format Messages
2021-02-26 20:34:45,896 DEBUG [ckanext.cioos_harvest.plugin] #### Scheming, Composite, or Fluent extensions found, processing dictinary #### 2021-02-26 20:34:45,897 DEBUG [ckanext.cioos_harvest.plugin] 'ERDDAP':('/erddap/',) 2021-02-26 20:34:45,897 DEBUG [ckanext.cioos_harvest.plugin] 'ERDDAP':('/erddap/',)
5. Keyword Error
2021-02-26 20:34:45,967 ERROR [ckanext.cioos_theme.plugin] <type 'exceptions.TypeError'> 2021-02-26 20:34:45,967 ERROR [ckanext.cioos_theme.plugin] error:expected string or buffer, keywords:{u'fr': [u'fff', u'Temp\xe9rature de surface', u'\xc9tat de la mer', u'Hauteur de la surface de la mer', u"Densit\xe9 d'eau de mer", u'Pression', u'Oxyg\xe8ne', u'carbone inorganique dissous'], u'en': [u'Oceans', u'fff', u'seaSurfaceTemperature', u'seaState', u'seaSurfaceHeight', u'seawaterDensity', u'pressure', u'oxygen', u'dissolvedOrganicCarbon']} 2021-02-26 20:34:45,974 ERROR [ckanext.cioos_theme.plugin] expected string or buffer 2021-02-26 20:34:46,385 INFO [ckanext.spatial.harvesters.base.import] Document with GUID ca.coos_310997e1-4bc8-4779-a993-fbdf7aecb902 unchanged, skipping...