An Error has occurred in self-supervised pre-training

ghost commented 2 years ago

@Sara-Ahmed Thank you for sharing your wonderful achievements!

When I ran self-supervised pre-training as described, the following subprocess CalledProcessError was raised. Can you please help me how to solve this problem?

Typed command python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 72 --epochs 501 --min-lr 5e-6 --lr 1e-3 --training-mode 'SSL' --data-set 'STL10' --output 'checkpoints/SSL/STL10' --validate-every 10

Errors encountered subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', 'main.py', '--batch-size', '72', '--epochs', '501', '--min-lr', '5e-6', '--lr', '1e-3', '--training-mode', 'SSL', '--data-set', 'STL10', '--output', 'checkpoints/SSL/STL10', '--validate-every', '10']' returned non-zero exit status 2.

Sara-Ahmed commented 2 years ago

@mtakamat thanks for your interest in our work. Are you able to run with one GPU?

python main.py --batch-size 72 --epochs 501 --min-lr 5e-6 --lr 1e-3 --training-mode 'SSL' --data-set 'STL10' --output 'checkpoints/SSL/STL10' --validate-every 10

ghost commented 2 years ago

@Sara-Ahmed Thank you for your help!!

I closed this issue.

Sara-Ahmed / SiT

An Error has occurred in self-supervised pre-training #18