Open db141 opened 1 year ago
found the link to 'beit_large_patch16_224_pt22k_ft22k.pth' in the config, downloaded and placed it into the pretrained folder. Now Retraining is starting and CUDA is running out of memory...which I expected. :)
According to your paper you used A100 GPUs, in your repo you mention usage of 8 GPUs on one note. Does that mean you went into the training with 8*A100?
found the link to 'beit_large_patch16_224_pt22k_ft22k.pth' in the config, downloaded and placed it into the pretrained folder. Now Retraining is starting and CUDA is running out of memory...which I expected. :)
According to your paper you used A100 GPUs, in your repo you mention usage of 8 GPUs on one note. Does that mean you went into the training with 8*A100?
Yes, we use 8 * A100 to train the model.
Hi, thanks for sharing your great work! I tried to run a retraining on the cityscapes dataset. But unfortunately, it get's stuck after throwing several exceptions and does not use any GPU at all. What can I do? it's not really raising an error, just not producing any output at all/getting stuck.
thanks and best regards daboh
CONFIG=configs/cityscapes/mask2former_beit_adapter_large_896_80k_cityscapes_ss.py GPUS=2 PORT=${PORT:-29300}
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT \ train.py $CONFIG --launcher pytorch --deterministic ${@:3}
nvidia-smi: