NASA-IMPACT / hls-foundation-os

This repository contains examples of fine-tuning Harmonized Landsat and Sentinel-2 (HLS) Prithvi foundation model.
Apache License 2.0
304 stars 79 forks source link

Issues with multi_temporal_crop_classification.py config #47

Closed robmarkcole closed 8 months ago

robmarkcole commented 9 months ago

I created the dataset using the instructions in https://github.com/NASA-IMPACT/hls-foundation-os/issues/46

I have in [multi_temporal_crop_classification.py](https://github.com/NASA-IMPACT/hls-foundation-os/blob/main/configs/multi_temporal_crop_classification.py):

# TO BE DEFINED BY USER: data directory
data_root = "data/multi-temporal-crop-classification/"

splits = dict(
    train="data/multi-temporal-crop-classification/training_data.txt",
    val="data/multi-temporal-crop-classification/validation_data.txt",
    test="data/multi-temporal-crop-classification/validation_data.txt",
)

However on running training I get:

rasterio.errors.RasterioIOError: data/multi-temporal-crop-classification/training_chips/chip_059_117_merged.tif: No such file or directory

Indeed chip_059_117 is defined in training_data.txt but is NOT present in training_chips.

On further searching I find the splits are defined in this repo, in which case I try:

splits = dict(
    train="data_splits/multi_temporal_crop_classification/training_data.txt",
    val="data_splits/multi_temporal_crop_classification/validation_data.txt",
    test="data_splits/multi_temporal_crop_classification/validation_data.txt",
)

However this results in:

FileNotFoundError: [Errno 2] No such file or directory: 'data/multi-temporal-crop-classification/hls-foundation-os/data_splits/multi_temporal_crop_classification/training_data.txt'

So I gather the text files must be copied to the location of the images. I do this and now have:

# TO BE DEFINED BY USER: data directory
data_root = "/teamspace/studios/this_studio/data/multi-temporal-crop-classification/"

splits = dict(
    train="training_data.txt",
    val="validation_data.txt",
    test="validation_data.txt",
)
CarlosGomes98 commented 8 months ago

Hi @robmarkcole . Thanks for taking the time to document this. Indeed, there seems to be some discrepancy in the split files. I will look into this.

In the meantime, since we already explicitly split the data into a train and a val directory, we shouldn't actually require these splits. I will push this change.

CarlosGomes98 commented 8 months ago

Removed splits from the crop config file