Closed andregraubner closed 3 years ago
Proposed structure: All samples are nc files with one group "Data" and (if labeled) one group "Labels". Inside the data group we have data with the named dimensions "time", "variable", "lat", "lon".
This seems general enough to be useful while still simplifying the data loading (and makes code more readable and maintainable, because no dimensions have to be juggled around).
currently the implementations for labeled datasets and unlabelled datasets are different (because the unlabelled ALLHIST dataset has a different structure from the expert labels). We should probably unify that (and save the expert labels online accordingly).