Pale-Blue-Dot-97 / Minerva

Minerva project includes the minerva package that aids in the fitting and testing of neural network models. Includes pre and post-processing of land cover data. Designed for use with torchgeo datasets.
MIT License
19 stars 1 forks source link

Bug in caching dataset with existing file error in distributed computing #421

Open Pale-Blue-Dot-97 opened 5 months ago

Pale-Blue-Dot-97 commented 5 months ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Use a distributed computing setup
  2. Ensure cache==True for make_dataset
  3. See error

Expected behavior No error. If the dataset is not already cached, it should be created then cached under the unique hash. If it exists, the hash should be recognised and the dataset loaded.

Environment (please complete the following information):

Pale-Blue-Dot-97 commented 5 months ago

What appears to be happening here is that each process of a distributed process group sees that the dataset requested does not exist, hence they all try to independently create and they cache the dataset. As this will be in the same location, a conflict arises when the slower processes try caching to a now extant dataset

The solution is to ensure that only process 0 attempts to create the dataset. All other processes should then wait until 0 is finished, then they can load the dataset from the new cache.