Closed rettigl closed 5 days ago
Indeed not intended behavior.
Can you look at the datasets.json dict/file? DatasetsManager.load_datasets_dict()
should work.
I am assuming the processed folder was also saved in the list of files somehow. And if files from json are not matching files in folder, it tries to reextract the data.
Yes, they are all in the json file. This should only contain the extracted files, I would say.
And it's also pretty confusing that it says it reuses the existing data, yet still downloads them...
There are two checks that take place.
One is just checking if the path is in the json file.
https://github.com/OpenCOMPES/sed/blob/86978c08be702f550ae10c04be1357cc012ebcf0/sed/dataset/dataset.py#L175-L183
Second check sees if the files match.
https://github.com/OpenCOMPES/sed/blob/86978c08be702f550ae10c04be1357cc012ebcf0/sed/dataset/dataset.py#L363-L364
The log messages need to be improved to reflect that.
Here, the issue is that the second check fails. I can't understand how the processed folder ended up in the files key. Somehow it went to the else condition even though data was present and overwrote the file_list here https://github.com/OpenCOMPES/sed/blob/86978c08be702f550ae10c04be1357cc012ebcf0/sed/dataset/dataset.py#L373
Somehow I also cannot trigger this behavior right now anymore. I will close for now, until this happens again.
If you remove or rename processed buffer file folders inside a dataset, that data is being downloaded again, even if the dataset modules says it would not do it.