Closed leon-costa closed 1 year ago
Hi, @leon-costa, sorry for the late reply. I try to run bash tools/dist_train.sh configs/classification/imagenet/resnet/resnet50_rsb_a3_sz160_8xb256_ep100.py 1 --auto_resume
and haven't found the error of error: unrecognized arguments: --local-rank=0
. I suggest that you can run OpenMixup with PyTorch<=1.13.1 and check whether you are using the latest source code of OpenMixup, which I haven't found errors in installation and DDP training. Currently, OpenMixup has some errors in running with PyTorch==2.0.1. You can try the following scripts,
conda create -n openmixup python=3.8 pytorch=1.13 cudatoolkit=11.6 torchvision -c pytorch -y
conda activate openmixup
pip install openmim
mim install mmcv-full
pip install opencv-python
git clone https://github.com/Westlake-AI/openmixup.git
cd openmixup
python setup.py develop
Hi. Thank you for your reply.
Yes I'm on the latest commit on the main
branch.
I tried your commands:
conda create -n openmixup python=3.8 pytorch=1.13 cudatoolkit=11.6 torchvision -c pytorch -y
failed with PackagesNotFoundError: The following packages are not available from current channels: - cudatoolkit=11.6
cudatoolkit=10.1
instead (like in the install.md) and it workedbash tools/dist_train.sh configs/classification/imagenet/resnet/resnet50_rsb_a3_sz160_8xb256_ep100.py 1 --auto_resume
again it failed with a new error: AttributeError: module 'cv2' has no attribute 'COLOR_BGR2RGB'
resnet50_rsb_a3_sz160_8xb256_ep100.py
I got a ZeroDivisionError: integer division or modulo by zero
, that's caused by torch.cuda.device_count()
returning 0
AssertionError: Torch not compiled with CUDA enabled
when calling torch.zeros(1).cuda()
)conda create -n openmixup python=3.8 pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia -y
conda activate openmixup
mim install mmcv-full
pip install opencv-python==4.5.4.60
git clone https://github.com/Westlake-AI/openmixup.git
cd openmixup
python setup.py develop
And it worked, I was able to start a training.
Thanks for your detailed solutions! @leon-costa👍 We will add a reference to this issue in install.md
. To summarize, the main problems are attributed to PyTorch
installation and the version of opencv-python
.
Describe the bug
I followed the installation instructions in https://github.com/Westlake-AI/openmixup/blob/main/docs/en/install.md#install-openmixup and everything went well (except Apex but it's optional).
When I run the first Getting Started example command I get the following error:
To Reproduce
Follow the installation instructions and execute the example command as described above.
Post related information
pip list | grep "openmixup\|^torch"
No modified config. I just changed the 8 gpus to 1 gpu in the example command.
Additional context
I initially tried to install everything by following the instructions here but the last command
python setup.py develop
failed with this error: