numpy issue - error while training

bbbts commented 3 months ago

Here are the exact steps I followed while setting up the environment from scratch -

After this, I tried to run the following command to start the training -

For reference, my "upernet_internimage_b_512_160k_ade20k.py" file looks like this -

After running the above training command, I am getting the following error - "ImportError: numpy.core.multiarray failed to import" as seen here -

Please let me know what should be done. Thanks in advance!!

chenller commented 3 months ago

I don't think this is a bug caused by mmseg-ext, more like a bug caused by cv2. I did not encounter this error so I cannot give an accurate solution. In my experience, you can try the following solutions: Option 1: Try opencv-python==4.10.0 and numpy==1.26.4 (This bug does not appear in my environment). Option 2: Reduce the opencv-python version, you can try several lower versions.

bbbts commented 3 months ago

I don't think this is a bug caused by mmseg-ext, more like a bug caused by cv2. I did not encounter this error so I cannot give an accurate solution. In my experience, you can try the following solutions: Option 1: Try opencv-python==4.10.0 and numpy==1.26.4 (This bug does not appear in my environment). Option 2: Reduce the opencv-python version, you can try several lower versions.

Thank you for the quick reply. I installed opencv-python==4.9.0.80 and numpy==1.26.3. I am getting a different error now -

Could you kindly export the whole environment you used so that I can use the same environment for this as well? (just for your reference, I am trying to use this project for semantic segmentation)

chenller commented 3 months ago

Change the config referenced by the "mmsegext" line(line 2) in the base variable in the config.py file to the absolute path. I will fix this bug in the next version
Look at line 24 to the end of the requirements.txt file. Generated by the 'pip freeze' command

chenller commented 3 months ago

Also, add the code "import mmsegext" at the beginning of the file "tool/train.py"

bbbts commented 3 months ago

import mmsegext

Thank you very much for the help. So, the requirements.txt file looks like this -

Should I uncomment the line 24? (pip freeze) or should I uncomment every line that is commented out after line 24?

Also, should I mention the absolute path (in my case "/home/bhattb3/MMSEG/configs/base" as you can see from the picture below) in the config.py file (in line 37 where it says "BASE_KEY")?

chenller commented 3 months ago

Cancelling the comment is not necessary, I just want to let you know the environment I am using. You can do nothing about the file requirements.txt.
mmengine does not support custom field mmsegext (see url), so I modified the value of the imported library variable (see line 7 and 9) to support the mmsegext field. So it is necessary to execute import mmsegext after import mmengine.config (see url). Using an absolute path to reference config can permanently avoid this issue. Solution 1: Use an absolute path. Change code mmsegext::_base_/datasets/ade20k_512_tta_without_ratio.py to /absolute path/configs/_base_/datasets/ade20k_512_tta_without_ratio.py. Solution 2: Download the latest mmsegext code, I have fixed this error.
About the user manual of mmengine.config

bbbts commented 3 months ago

Cancelling the comment is not necessary, I just want to let you know the environment I am using. You can do nothing about the file requirements.txt.

mmengine does not support custom field mmsegext (see url), so I modified the value of the imported library variable (see line 7 and 9) to support the mmsegext field. So it is necessary to execute import mmsegext after import mmengine.config (see url). Using an absolute path to reference config can permanently avoid this issue. Solution 1: Use an absolute path. Change code mmsegext::_base_/datasets/ade20k_512_tta_without_ratio.py to /absolute path/configs/_base_/datasets/ade20k_512_tta_without_ratio.py. Solution 2: Download the latest mmsegext code, I have fixed this error.

About the user manual of mmengine.config

Thank you so much for the help. I did everything as instructed. However, I am getting this error now -

The error is - "ModuleNotFoundError: No module named 'mmsegextlib_msda'" as seen below -

chenller commented 3 months ago

I think it's possible that the dependent library is not installed. Open install.sh, execute the commands in the file line by line in the terminal, and you will see the compilation process.

bbbts commented 3 months ago

I think it's possible that the dependent library is not installed. Open install.sh, execute the commands in the file line by line in the terminal, and you will see the compilation process.

Thank you so much for the reply. I tried to execute the first command of the install.sh file "python setup.py build install" as seen below -

However, I am getting the following error "NotImplementedError: Cuda is not availabel"-

I tried to do the following to install cuda==10.0 as but getting error as shown below -

Thanks in advance!

chenller commented 3 months ago

If compilation is required, you should ensure that the CUDA version corresponds to the PyTorch version. There are two solutions: one is to install the pytorch version corresponding to the CUDA version, and the other is to install the CUDA version corresponding to the pytorch version. My CUDA Environment: My Pytorch Environment:

bbbts commented 3 months ago

If compilation is required, you should ensure that the CUDA version corresponds to the PyTorch version. There are two solutions: one is to install the pytorch version corresponding to the CUDA version, and the other is to install the CUDA version corresponding to the pytorch version. My CUDA Environment: My Pytorch Environment:

Thank you so much for your help. I am sorry for the question again. I was actually trying to create an environment from scratch again. I followed the exact steps below to create it -

As you can see, in step 10, I was trying to execute the command "bash install.sh". However I am getting the following many errors - "The detected CUDA version (10.0) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions."

As per your instructions, I tried to install pytorch according to the cuda version that was detected (10.0) from https://pytorch.org/get-started/previous-versions/ but I am not able to find any version that matches 10.0. As you can see, the version 10.0 is not available (since my errors says - "The detected CUDA version (10.0) mismatches the version that was used to compile PyTorch (12.1). Please make sure to use the same CUDA versions."). The lowest version available is 11. 8 -

Sorry for the bother. Kindly help. Should I change my steps from step 1 itself? should I start form scratch?

chenller commented 3 months ago

You are about to succeed.

In this situation, you must upgrade your CUDA version. CUDA 10.2, CUDA 11.6, and CUDA 11.7 are good choices, and you can also choose higher versions. At the same time, it is necessary to install CUDNN that corresponds to the CUDA version. Then create an environment and install PyTorch.

The pytorch version is not very important, you should first ensure that pytorch corresponds to the CUDA version.

Please refer to the command for installing pytorch, corresponding to different versions of CUDA.


# CUDA 10.2
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=10.2 -c pytorch

# CUDA 11.6
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.6 -c pytorch -c nvidia

# CUDA 11.7
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.7 -c pytorch -c nvidia

chenller commented 3 months ago

You don't need to execute the 'pip install - r requirements. txt' command, just follow the steps in the documentation to install and use it

chenller / mmseg-extension

numpy issue - error while training #3