Question about training on custom data

TriDvaRas commented 2 years ago

Hi, i'm trying to run the train script on my custom data. I was following to 'Create your data' part of readme and it's a bit unclear what's the purpose of my_dataset/train folder. By this part of readme i assumed that this folder is auto-populated

# LaMa generates random masks for the train data on the flight,
# but needs fixed masks for test and visual_test for consistency of evaluation.

I left it empty and the run fails with 'num_samples should be a positive integer value, but got num_samples=0' Looking at logs it seems like it's trying to find files there

[2022-04-20 02:49:17,229][saicinpainting.training.data.datasets][INFO] - Make train dataloader default from /home/conda/lama/my_dataset/train. Using mask generator=mixed
[2022-04-20 02:49:17,256][__main__][CRITICAL] - Training failed due to num_samples should be a positive integer value, but got num_samples=0:
Traceback (most recent call last):
  File "bin/train.py", line 63, in main
    trainer.fit(training_model)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
    self.dispatch()
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
    self.accelerator.start_training(self)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
    self._results = trainer.run_train()
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 620, in run_train
    self.train_loop.reset_train_val_dataloaders(model)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 218, in reset_train_val_dataloaders
    self.trainer.reset_train_dataloader(model)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py", line 198, in reset_train_dataloader
    self.train_dataloader = self.request_dataloader(model.train_dataloader)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py", line 398, in request_dataloader
    dataloader = dataloader_fx()
  File "/home/conda/lama/saicinpainting/training/trainers/base.py", line 130, in train_dataloader
    dataloader = make_default_train_dataloader(**self.config.data.train)
  File "/home/conda/lama/saicinpainting/training/data/datasets.py", line 250, in make_default_train_dataloader
    dataloader = DataLoader(dataset, **dataloader_kwargs)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 268, in __init__
    sampler = RandomSampler(dataset, generator=generator)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 104, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Other folders seem to look fine in log

...
[2022-04-20 02:47:47,487][saicinpainting.training.trainers.base][INFO] - BaseInpaintingTrainingModule init done
[2022-04-20 02:47:47,500][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2022-04-20 02:47:47,500][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
[2022-04-20 02:47:47,959][saicinpainting.training.data.datasets][INFO] - Make val dataloader default from /home/conda/lama/my_dataset/val
[2022-04-20 02:47:47,960][saicinpainting.evaluation.data][INFO] - debug stars
[2022-04-20 02:47:47,997][saicinpainting.training.data.datasets][INFO] - Make val dataloader default from /home/conda/lama/my_dataset/visual_test
[2022-04-20 02:47:47,998][saicinpainting.evaluation.data][INFO] - debug stars
[2022-04-20 02:48:08,425][saicinpainting.evaluation.evaluator][INFO] - <class 'saicinpainting.evaluation.evaluator.InpaintingEvaluatorOnline'>: evaluation_end called
[2022-04-20 02:48:08,425][saicinpainting.evaluation.evaluator][INFO] - Getting value of ssim
[2022-04-20 02:48:08,426][saicinpainting.evaluation.evaluator][INFO] - Getting value of ssim done
...

So I also tried putting some images in train folder with same structure as val but it still fails with the same error

windj007 commented 2 years ago

Hi!

In order for training pipeline to work, you'll need 3 datasets:

Training data. This is just a folder with jpg's, no extra action is needed. Masks are generated on the fly
Validation data. It is used to evaluate the model after each epoch - and to automatically choose the best one. The creation process is described in Create your data section.
"Visual test" data. It is just like validation, but small. It is useful to assess the generator performance by eye - the pipeline visualizes every sample from this dataset (unlike training and validation). You can put here the most interesting and difficult samples (image+mask pairs). Metrics like FID are not informative when calculated on a small data, so despite metrics are calculated for visual test as well, they are not worth paying attention to.

Does this answer your question?

TriDvaRas commented 2 years ago

Yes, thank you!

CodeMadUser commented 2 years ago

hello,I have some questions about the dataset: are the training data clean photos? Are the validation photos the same as the training photos? Or are photos data with a mask? thank you !

leslieburke commented 1 year ago

Hi!

In order for training pipeline to work, you'll need 3 datasets:

* Training data. This is just a folder with jpg's, no extra action is needed. Masks are generated on the fly

* Validation data. It is used to evaluate the model after each epoch - and to automatically choose the best one. The creation process is described in [Create your data](https://github.com/saic-mdal/lama#create-your-data) section.

* "Visual test" data. It is just like validation, but small. It is useful to assess the generator performance by eye - the pipeline visualizes every sample from this dataset (unlike training and validation). You can put here the most interesting and difficult samples (image+mask pairs). Metrics like FID are not informative when calculated on a small data, so despite metrics are calculated for visual test as well, they are not worth paying attention to.

Does this answer your question?

@windj007 Hi，If I want to train a model in a single scene, such as a grassland, how many pictures should my training set have at least to achieve a better result?

advimman / lama

Question about training on custom data #109