DOI-USGS / lake-temperature-lstm-static

Predict lake temperatures at depth using static lake attributes
Other
0 stars 3 forks source link

Change directory structure to better accommodate different data sources and drivers #17

Open AndyMcAliley opened 2 years ago

AndyMcAliley commented 2 years ago

Following up on this discussion, it will be cumbersome to add new data sources and drivers. The folders created when the pipeline is run could be fully organized by data source and driver type, but they are not.

For example, after the pipeline is run, the directory structure in 1_fetch looks like this:

├─out
├───dynamic_mntoha/
├───obs_mntoha/
├───lake_metadata.csv
├─tmp
├───dynamic_mntoha/
└───obs_mntoha/

Some issues with this organization system:

  1. No subfolders in tmp/ or out/. At best, future data sources must be identified based on a suffix (e.g. _mntoha). At worst, there is no suffix (as with lake_metadata.csv), so the situation is ripe for file collisions that result in a file being overwritten or used for the wrong data source.
  2. There's no distinction between NLDAS drivers and GCM drivers.
AndyMcAliley commented 2 years ago

A better way to organize the files might be this:

├─out
├───mntoha
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
├─────lake_metadata.csv
├───large_midwest_footprint
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
├─────lake_metadata.csv
├─tmp
├───mntoha
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
├───large_midwest_footprint
├─────nldas
├─────gcm_access
├─────gcm_gfdl
├─────gcm_... (four more GCM types)
├─────clarity
├─────ice_flags
├─────temperature_observations
lindsayplatt commented 2 years ago

I have been using the suffix/prefix approach over in lake-temperature-out. I'm not super satisfied by it because you end up having to scroll through a lot of files, so I like the idea of a nested approach!