GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
547 stars 87 forks source link

Harvesting2.0 conflicting name with CKAN #4786

Open jbrown-xentity opened 2 weeks ago

jbrown-xentity commented 2 weeks ago

Currently when harvesting under a new harvest source, if the name conflicts with CKAN then CKAN rejects the dataset. This is bad practice; we don't want CKAN to block datasets even if the title is the same. For example, if USGS and National Parks both have a dataset titled National Parks and associated information, but the "associated information" is different per agency, these are both valid and should be allowed to exist.

How to reproduce

  1. Create a harvest source in Harvesting2.0
  2. Create a duplicate harvest source in Harvesting2.0
  3. Run both harvest sources
  4. Watch the second run fail with CKAN error message about name conflict.

Expected behavior

Datasets are created, since title is not required to be unique across organization or between organizations.

Actual behavior

Dataset not created.

Sketch

We should check the CKAN response for a failure, and see if it's this specific error type. If it is, try limiting the name to 96 characters, add 4 random characters to the end of the name, and then pushing back to CKAN. If it fails again, then assume it needs to fail and fail the record and move on.