Open yemoski opened 1 year ago
Here's an interesting conundrum: because of the fact that licensing info isn't available through DataONE yet, we actually get more information by indexing g-e-m ourselves than we'll get when it comes through DataONE. What do we do with that? Let them index it anyway and be happy when licensing makes it in? Somehow exclude g-e-m from DataONE queries?
This is actually also a problem inside the dataONE data, because they have duplicates too! Should I consider things with the same DOI to be the same thing, and collapse them? I think, for the purposes of this tool, that might be a good move.
I've started by removing duplicates in individual queries, but the broader problem of removing duplicates across the federated search remains.
Can go some way towards this by making a SearchResult class that's hashable.
This is REALLY HARD TO DO.