Open AndyMcAliley opened 2 years ago
A better way to organize the files might be this:
├─out
├───mntoha
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
├─────lake_metadata.csv
├───large_midwest_footprint
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
├─────lake_metadata.csv
├─tmp
├───mntoha
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
├───large_midwest_footprint
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
I have been using the suffix/prefix approach over in lake-temperature-out
. I'm not super satisfied by it because you end up having to scroll through a lot of files, so I like the idea of a nested approach!
Following up on this discussion, it will be cumbersome to add new data sources and drivers. The folders created when the pipeline is run could be fully organized by data source and driver type, but they are not.
For example, after the pipeline is run, the directory structure in
1_fetch
looks like this:Some issues with this organization system:
_mntoha
). At worst, there is no suffix (as withlake_metadata.csv
), so the situation is ripe for file collisions that result in a file being overwritten or used for the wrong data source.