Explicitly handle duplicate dataset identifiers per source

adamreichold / umwelt-info

umwelt.info metadata index

https://umwelt.info

GNU Affero General Public License v3.0

1 stars 0 forks source link

Explicitly handle duplicate dataset identifiers per source #7

Closed adamreichold closed 2 years ago

adamreichold commented 2 years ago

It appears that some sources allocate duplicate dataset identifiers which we currently implicitly handle via last-write-wins. This should be replaced a explicit handling that decides which version should be used.

adamreichold commented 2 years ago

Related to but not the same as #23. Most likely, is more an issue with the harvested source changing as we harvest it if it is very large, so we might just want to log this as a warning in a generic dataset writing (and sanitizing) function.