gastruc / OmniSat

MIT License
51 stars 2 forks source link

Dataset Configuration Issue for TreeSatAI-TS Results Reproduction #2

Open Anonymous12345678 opened 2 weeks ago

Anonymous12345678 commented 2 weeks ago

Hello,

We are working on reproducing the results from the paper on the TreeSatAI-TS dataset but have encountered some issues related to the dataset configuration.

In accordance with the instructions, running python src/train.py exp=TSAITS_OmniSAT calls the file ./src/data/TreeSAT.py, which expects the TreeSatAI-TS dataset to be organized in the following structure:


`./data/TreeSAT/
│
├── train_filenames.lst
├── val_filenames.lst
├── test_filenames.lst
│
├── labels/
│   └── TreeSatBA_v9_60m_multi_labels.json
│
├── aerial/
│   ├── image1.tif
│   ├── image2.tif
│   └── ... (additional aerial image files)
│
├── sentinel/
│   ├── image1.h5
│   ├── image2.h5
│   └── ... (additional Sentinel-1 and Sentinel-2 HDF5 files)
│
├── s1-asc/
│   ├── image1.pth
│   ├── image2.pth
│   └── ... (additional Sentinel-1 Ascending `.pth` files)
│
├── s1-des/
│   ├── image1.pth
│   ├── image2.pth
│   └── ... (additional Sentinel-1 Descending `.pth` files)
│
├── s1/
│   └── 60m/
│       ├── image1.tif
│       ├── image2.tif
│       └── ... (additional Sentinel-1 60m resolution images)
│
└── s2/
    └── 60m/
        ├── image1.tif
        ├── image2.tif
        └── ... (additional Sentinel-2 60m resolution images)
`

However, the sentinel.zip file available at the provided dataset link (https://huggingface.co/datasets/IGNF/TreeSatAI-Time-Series) only contains .tif images located in the s1/ and s2/ 60m/200m subfolders. It does not include the .h5 files expected in the sentinel/ folder.

Additionally, the s1-asc/ and s1-des/ folders are missing, along with their corresponding .pth files. This seems to be a separate issue, as the .pth files are not present in the shared dataset at all.

Could you please provide guidance on resolving these discrepancies or point us to the correct dataset files? Thank you very much!

gastruc commented 2 weeks ago

Hi, Thank you for your detailed issue. Concerning the .h5 files, there are located in the sentinel-ts.zip archive that is available at the dataset link. Concerning s1-asc and s1-des .pth files, I indeed do a prepocessing to replace NaNs present in some dates. I've added the code preprocess_tsaits.py in src/utils/ to run once or to integrate the dataloader if you prefer. Do not hesitate if you still can not reproduce the experiments!

Anonymous12345678 commented 1 week ago

Hi, Thank you for your detailed issue. Concerning the .h5 files, there are located in the sentinel-ts.zip archive that is available at the dataset link. Concerning s1-asc and s1-des .pth files, I indeed do a prepocessing to replace NaNs present in some dates. I've added the code preprocess_tsaits.py in src/utils/ to run once or to integrate the dataloader if you prefer. Do not hesitate if you still can not reproduce the experiments!

Hi @gastruc , thank you for your reply

Since sentinel.zip (downloaded from the dataset link you had shared) contained only .tif files (and not .h5), what we did was to extract sentinel-ts.zip into the sentinel folder. However, running the script preprocess_tsaits.py gives the below error-

FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = './TreeSat/sentinel/Abies_alba_1_1005_WEFL_NLF.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

The sentinel-ts.zip seems to contain the file Abies_alba_1_1005_WEFL_NLF_2020.h5 (which is the closest to Abies_alba_1_1005_WEFL_NLF.h5'). Please could you let us know how do we proceed from here? Thanks!

gastruc commented 1 week ago

Hi, Yes you did the proper extraction. Indeed, I renamed all .h5 files with suppressing the last date (''.join(name.split('')[:-1])) for the names to match the other files. Hope that this time, you'll be able to reproduce the experience!