Open switzersc opened 10 years ago
I'm working a similar problem in my spare time to try and consolidate data on plant varieties (particularly heirloom food crops). Each source needs to be scrubbed and deduplicated, and then during aggregation not all sources have the same fields available, so the canonical data set has the union of fields from the sources, along with pointers to the provenance of the data (which I haven't worked out yet).
I'll be very interested in seeing how you solve your similarly shaped problems.
I am not sure if this is at all useful or relevant, but I wanted to leave this here just in case Miso Dataset.
We also need to finalize how we'd like to handle the different data sources. Options:
Ideas? Thoughts?