ContinuumIO / earthio

Data reader utilities for machine learning on satellite imagery and Earth science data
32 stars 17 forks source link

Earth science data catalogue #16

Open PeterDSteinberg opened 7 years ago

PeterDSteinberg commented 7 years ago

earthio should be a data catalogue (see also the wiki notes from meeting yesterday):

Handle the following data downloading concerns:

PeterDSteinberg commented 7 years ago
dharhas commented 7 years ago

Some comments:

If you can find a better word, go for it, nobody is happy with feature. Typically a lot of services use: site, site code, monitoring station, location etc. This works reasonably well for data at a particular point but is somewhat misleading in other cases. The OGC SOS2 spec has a concept of 'Feature of Interest' which is where the terminology 'feature' came from (see http://www.ogcnetwork.net/sos_2_0/tutorial/om).

This can be a bit tricky. For example SRTM data can be downloaded from several locations but the original Provider is Nasa. In other efforts we have used Provider to mean a combination of the source organization and a particular method of access (i.e. NOAA Coastwatch Tabledap service). Then the 'Services' are distinct dataset services within that main service. This isn't the only way to do it but it seems reasonable from an implementation standpoint.

jbednar commented 7 years ago

providers

Maybe distinguish "provider" (where we get the data) from "source" (or "creator", i.e., where the data originally came from)?

PeterDSteinberg commented 7 years ago

I'll add here an additional requirement of this data catalogue service and we may make separate issues for it over time: We need to avoid accidental massive downloads. For example, if I plan to download what I think is about 100 GB of data but I misconfigure the bounding box of the query in space / time and I attempt to download 10 TB. Consider config/CLI/env settings that control one or more of: