advimman / lama

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
https://advimman.github.io/lama-project/
Apache License 2.0
7.99k stars 849 forks source link

Question about training on custom data #109

Closed TriDvaRas closed 2 years ago

TriDvaRas commented 2 years ago

Hi, i'm trying to run the train script on my custom data. I was following to 'Create your data' part of readme and it's a bit unclear what's the purpose of my_dataset/train folder. By this part of readme i assumed that this folder is auto-populated

# LaMa generates random masks for the train data on the flight,
# but needs fixed masks for test and visual_test for consistency of evaluation.

I left it empty and the run fails with 'num_samples should be a positive integer value, but got num_samples=0' Looking at logs it seems like it's trying to find files there

[2022-04-20 02:49:17,229][saicinpainting.training.data.datasets][INFO] - Make train dataloader default from /home/conda/lama/my_dataset/train. Using mask generator=mixed
[2022-04-20 02:49:17,256][__main__][CRITICAL] - Training failed due to num_samples should be a positive integer value, but got num_samples=0:
Traceback (most recent call last):
  File "bin/train.py", line 63, in main
    trainer.fit(training_model)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 499, in fit
    self.dispatch()
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 546, in dispatch
    self.accelerator.start_training(self)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/accelerators/accelerator.py", line 73, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 114, in start_training
    self._results = trainer.run_train()
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/trainer.py", line 620, in run_train
    self.train_loop.reset_train_val_dataloaders(model)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/training_loop.py", line 218, in reset_train_val_dataloaders
    self.trainer.reset_train_dataloader(model)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py", line 198, in reset_train_dataloader
    self.train_dataloader = self.request_dataloader(model.train_dataloader)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/pytorch_lightning/trainer/data_loading.py", line 398, in request_dataloader
    dataloader = dataloader_fx()
  File "/home/conda/lama/saicinpainting/training/trainers/base.py", line 130, in train_dataloader
    dataloader = make_default_train_dataloader(**self.config.data.train)
  File "/home/conda/lama/saicinpainting/training/data/datasets.py", line 250, in make_default_train_dataloader
    dataloader = DataLoader(dataset, **dataloader_kwargs)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 268, in __init__
    sampler = RandomSampler(dataset, generator=generator)
  File "/home/conda/miniconda3/envs/lama/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 104, in __init__
    "value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Other folders seem to look fine in log

...
[2022-04-20 02:47:47,487][saicinpainting.training.trainers.base][INFO] - BaseInpaintingTrainingModule init done
[2022-04-20 02:47:47,500][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2022-04-20 02:47:47,500][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
[2022-04-20 02:47:47,959][saicinpainting.training.data.datasets][INFO] - Make val dataloader default from /home/conda/lama/my_dataset/val
[2022-04-20 02:47:47,960][saicinpainting.evaluation.data][INFO] - debug stars
[2022-04-20 02:47:47,997][saicinpainting.training.data.datasets][INFO] - Make val dataloader default from /home/conda/lama/my_dataset/visual_test
[2022-04-20 02:47:47,998][saicinpainting.evaluation.data][INFO] - debug stars
[2022-04-20 02:48:08,425][saicinpainting.evaluation.evaluator][INFO] - <class 'saicinpainting.evaluation.evaluator.InpaintingEvaluatorOnline'>: evaluation_end called
[2022-04-20 02:48:08,425][saicinpainting.evaluation.evaluator][INFO] - Getting value of ssim
[2022-04-20 02:48:08,426][saicinpainting.evaluation.evaluator][INFO] - Getting value of ssim done
...

So I also tried putting some images in train folder with same structure as val but it still fails with the same error

windj007 commented 2 years ago

Hi!

In order for training pipeline to work, you'll need 3 datasets:

Does this answer your question?

TriDvaRas commented 2 years ago

Yes, thank you!

CodeMadUser commented 2 years ago

hello,I have some questions about the dataset: are the training data clean photos? Are the validation photos the same as the training photos? Or are photos data with a mask? thank you !

leslieburke commented 1 year ago

Hi!

In order for training pipeline to work, you'll need 3 datasets:

* Training data. This is just a folder with jpg's, no extra action is needed. Masks are generated on the fly

* Validation data. It is used to evaluate the model after each epoch - and to automatically choose the best one. The creation process is described in [Create your data](https://github.com/saic-mdal/lama#create-your-data) section.

* "Visual test" data. It is just like validation, but small. It is useful to assess the generator performance by eye - the pipeline visualizes every sample from this dataset (unlike training and validation). You can put here the most interesting and difficult samples (image+mask pairs). Metrics like FID are not informative when calculated on a small data, so despite metrics are calculated for visual test as well, they are not worth paying attention to.

Does this answer your question?

@windj007 Hi,If I want to train a model in a single scene, such as a grassland, how many pictures should my training set have at least to achieve a better result?