OCHA-DAP / ds-raster-pipelines

1 stars 0 forks source link

FloodScan file names #44

Open zackarno opened 2 weeks ago

zackarno commented 2 weeks ago

currently some of metadata is captured in basename of tif whereas other is captured in path

This means that the file name for SFED & MFED are actually the same, this works if the file has both bands, but if they are separated I think the basename of tif itself should have sfed/mfed indicated. That way we can load the MFED & SFED stacks together and they will have unique sources property (which uses basename) so that we can subset and compare.

Also I've noticed it seems a bit inconsistent as to what is stored in folder structure vs base filename

For example here:

floodscan/v5/processed/MFED/aer_area_300s_19980112_v05r01.tif

We store v5 in both basename and folder name but not MFED. Assuming we will not combine the bands i'd probably go with:

floodscan/v5/processed/MFED/aer_mfed_300s_19980112_v05r01.tif
floodscan/v5/processed/SFED/aer_sfed_300s_19980112_v05r01.tif
isatotun commented 1 week ago

I have added this change to the PR. I think it makes sense.

t-downing commented 1 week ago

Just flagging that this is slightly different to the folder hierarchy convention we agreed on for IMERG and ERA5, where we have:

raster/
├── seas5/
│   └── monthly/
│       ├── raw/
│       │   └── ...
│       └── processed/
│           └── ...
├── imerg/
│   └── daily/
│       └── late/
│           └── v7/
│               ├── raw/
│               │   └── ...
│               └── processed/
│                   └── ...
└── era5/
    └── monthly/
        ├── raw/
        │   └── ...
        └── processed/
            └── ...

So basically, its dataset/temporal/[other dataset-specific stuff, like version]/[raw or processed]. The difference here is that MFED/SFED (a dataset-specific thing) is now the lowest level, instead of raw/processed. Really not a big deal though.

I do think that setting SFED/MFED would be a good use of the bands (we'd discussed this previously for ERA5, where the bands could be used for different variables like precipitation, temperature, etc). But I see from #42 that maybe it's not worth doing that for now.

isatotun commented 6 days ago

I will also add the daily to the structure to match the other ones.

t-downing commented 4 days ago

Sorry just seeing this now- for Floodscan, I don't think we actually need daily, because they only have a daily product (whereas for IMERG, SEAS5, ERA5, etc there are other non-daily products). Either way not a big deal if you've already done it, perhaps it's good to have anyways to better match the others.

hannahker commented 4 days ago

I'm not opposed to keeping daily to have consistency with the other sources.