cavalleria / cavaface

face recognition training project(pytorch)
MIT License
463 stars 87 forks source link

Error when attempting augmentation #6

Closed xsacha closed 4 years ago

xsacha commented 4 years ago

I tried to use RandAugment as well as the RandErasing augmentation but whenever I start the training I get the error: "EOFError: Ran out of input"

Full output follows:

============================================================
/usr/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 11 leaked semaphores to clean up at shutdown
  len(cache))
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 113, in _main
    preparation_data = reduction.pickle.load(from_parent)
EOFError: Ran out of input
/usr/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 11 leaked semaphores to clean up at shutdown
  len(cache))
Traceback (most recent call last):
  File "train.py", line 333, in <module>
    main()
  File "train.py", line 58, in main
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, cfg))
  File "/home/imagus/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/imagus/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/imagus/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 108, in join
    (error_index, name)
Exception: process 2 terminated with signal SIGSEGV
cavalleria commented 4 years ago

I tested RandErasing augmentation is ok, and have fixed RandAugment bugs.

xsacha commented 4 years ago

Thanks, I'll try RandAugment again :)

Not sure if my PyTorch version is an issue: I'm on 1.5. Edit: I'm still getting the same error.

cavalleria commented 4 years ago

My pytorch version is 1.1.0 and torchvision is 0.3.0, you can try it.

xsacha commented 4 years ago

Yeah the version is the difference. For 1.5.0 I can use the inbuilt (to torchvision) augmentations, which helps.