Open jbrown-xentity opened 2 years ago
In trying to write a test case to cover this use case, we ran into problems...
The open item should patch this issue for now, but various issues will arise with this extremely complex code of collections and parents. Ideally we should simplify where we can.
We get errors when trying to verify if datasets already exist in the fetch harvester for DCAT-US objects that have unique identifiers with
:
in the string, such ashttps://something-something.gov
. You can see these class of errors in the logs, usespace_name:prod app_name:catalog-fetch missing:gauge "Reason: org.apache.solr.search.SyntaxError"
as the filter. As of now, there have been 74 instances of this in the last 24 hours.How to reproduce
https://
in the stringExpected behavior
Successful harvest (using https is actually the recommended behavior by the spec)
Actual behavior
Harvest fails
Sketch
Add a new test covering this use case in ckanext-datajson, then investigate how to properly use CKAN call to search appropriately. If there is a bug in the CKAN system in requesting this data from solr, that will need to be raised. It's also possible that this function call only occurs on a re-harvest, when checking the current system if the data already exists (versus a first harvest, which doesn't check anything). We've done this in the past (see here). The error is occurring here.