EarthSystemCoG / COG

COG source code
BSD 3-Clause "New" or "Revised" License
8 stars 16 forks source link

Enable creation of wget scripts with no duplicates #1307

Open LucaCinquini opened 8 years ago

LucaCinquini commented 8 years ago

Who: Karl Taylor

t's not obvious to me how I should create a wget script that can retrieve all available output (but not duplicates) for a search satisfying:

CMIP5 1pctCO2 mon atmos Amon r1i1p1

If I click on "show all replicas" I will get results for 33 models and many, many duplicates. If I click on "search local node only (including all replicas), I miss 11 models and I still get a few duplicates. If I add "Datanode = "aims3.llnl.gov" to my search, I miss the same 11 models, and the datasets for two models are split into two (each containing only some of the dataset variables).

I would like to get 33 models, but not any duplicates. Can this be done?

LucaCinquini commented 8 years ago

Possible strategies:

o Return no duplicates when executing the Solr search - can this be done over a distributed search ? o Remove duplicate when the wget script is created (might not be possible because they are created for a single index node)

o Have an option "Remove duplicates" in the data cart Need to create a dictionary of the form (datase_id) : ( node1, node2, ...) and expose that information to the user to select which node to remove duplicates from.