earthpulse / eotdl

Earth Observation Training Datasets
https://eotdl.com
MIT License
17 stars 6 forks source link

standardized information on training datasets #116

Closed Patrick1G closed 8 months ago

Patrick1G commented 9 months ago

@juansensio Currently users working with TDS in EOTDL are presented with the data structure as they come out of the uploaded zip file. And these structures vary from dataset o dataset.. For example in the case of S2SHIPS the unpacked structures are S2SHIPS/S2SHIPS/geojson/ and S2SHIPS/S2SHIPS/geotiff/ but for other datasets it will be completely different.

We need to curate this better, for example by either providing standardized information (at data ingest) on key dataset components, such as annotations: >type >format

So that the user finds it easier working with the different datasets... The user would know that there is a default folder called annotations with some parameters of subfolder...

earthpulse commented 9 months ago

Already working on it.

We are changing the ingestion mechanism to work in a per-file basis instad of zipped archives.

Like this we will be able to retrieve the entire folder structure and it also presents some advantages for dataset versioning.

Rolling out in the coming weeks.

Patrick1G commented 9 months ago

that sounds good. Also, maybe having general functions such as load_annotations() or dataset_info() could be used to provide this info to the user in a consistent way This would require that this is defined or identified at point of ingest...

earthpulse commented 9 months ago

We can do that easily for Q1+ datasets, for Q0 would of course require inputs from users at ingestion time.