Open yemoski opened 1 year ago
You can get all of the data center urls by building a big query with each of their parameters as listed, like this: https://cmr.earthdata.nasa.gov/search/collections?data_center=AU/AADC&data_center=WGMS
OK, our next step is to make a query that grabs all the urls for the datasets from the data centers in the first list, and then turn it into a sitemap that lets Gleaner crawl it.
https://cmr.earthdata.nasa.gov/search/keywords/providers?pretty=true lists all the providers. so step 1 is to cross-reference them with our list of polar repositories so that we can just get datasets from the ones we want. @oluwayemisi4 is working on this.
OK, here's how to do this: https://cmr.earthdata.nasa.gov/search/collections?provider=AU_AADC returns something that's like a sitemap for AADC datasets. We can request that, turn it into an actual sitemap (like we did for BAS), and crawl it.
What else is in the GCMD that we want?
This is the same situation as the AMD - there's no json-ld in here, but there is API access, so we're going to have to figure something out.
GCMD
Top priority relevant repositories that only contain polar data
Relevant repositories that contain data that needed to be scoped down to polar data