Luo-Z13 / pointobb

[CVPR2024] PointOBB: Learning Oriented Object Detection via Single Point Supervision
MIT License
44 stars 3 forks source link

mmcv版本问题 #15

Closed zzhhzz666 closed 1 month ago

zzhhzz666 commented 2 months ago

作者您好,又来打搅您了。上次跟您说,每次跑几个epoch会爆内存,然后您说,可以使用延续点进行训练,第一次跑了10个epoch, 我延续这个,又跑到了第22个epoch,又断了,再然后就运行不了了。在第23个epoch开始时,就会直接爆,尝试了好几次,也只能处理第一组,然后直接 out of memory. 然后我按照您说的mmcv-full版本泄露问题,按照您给的指令,pytorch1.9.0对应的mmcv-full应该就是1.7.2.我安装的就是1.7.2.然后我从官网上下载1.7.1,但是就不是mmcv-full了,只能是mmcv1.7.1了,然后运行后,一直报ModuleNotFoundError: No module named ‘mmcv._ext..所以我想问下您用的是哪个版本的mmcv——full,或者说有没有其他解决方法。祝万事如意!

微信图片_20240715110536

zzhhzz666 commented 2 months ago

我将batch_size从2设置成了1,可以跑完24个epoch了,但是train的结果不太理想,map最好为0.386。所以想按照您给的设置,跑完24个epoch,但是会出现我前面提到的memory一直上升,我第一次跑10个epoch就断了,按照您说的延续checkpoint继续训练,第二次到22个epoch,然后23就彻底跑不了了,我看前面您23年回答过别人您用的mmcv版本是1.4.5,但是本文给的readme用的不是mmcv-full嘛,本来想再创建个环境,由于用的这个服务器磁盘空间满了,所以创建不了,为了保险才来问您。如有打扰,请见谅!望事事顺利!

zzhhzz666 commented 2 months ago

train的时候用mmcv-full==1.7.2,虽然会有内存泄漏,但是降低batch能运行出来,到了inference的时候,会告诉我安装mmcv>=1.3.2, <=1.4.0,就很奇怪。我又卸载了1.7.2安装mmcv-full1.4.0.也会有别的问题 (open-mmlab) zhuzhenhua@ubuntu-SYS-4028GR-TR:~/pointobb-main/PointOBB$ sh test_p.sh

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. If your shell is Bash or a Bourne variant, enable conda for the current user with

$ echo ". /home/zhaoxiaoyao/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc

or, for all users, enable conda with

$ sudo ln -s /home/zhaoxiaoyao/anaconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh

The options above will permanently enable the 'conda' command, but they do NOT put conda's base (root) environment on PATH. To do so, run

$ conda activate

in your terminal, or to put the base environment on PATH permanently, run

$ echo "conda activate" >> ~/.bashrc

Previous to conda 4.4, the recommended way to activate conda was to modify PATH in your ~/.bashrc file. You should manually remove the line that looks like

export PATH="/home/zhaoxiaoyao/anaconda3/bin:$PATH"

^^^ The above line should NO LONGER be in your ~/.bashrc file! ^^^

/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/init.py:21: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. 'On January 1, 2023, MMCV will release v2.0.0, in which it will remove ' Traceback (most recent call last): File "tools/train.py", line 24, in from mmdet import version File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmdet/init.py", line 25, in f'MMCV=={mmcv.version} is used but incompatible. ' \ AssertionError: MMCV==1.7.2 is used but incompatible. Please install mmcv>=1.3.2, <=1.4.0. /home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/init.py:21: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details. 'On January 1, 2023, MMCV will release v2.0.0, in which it will remove ' Traceback (most recent call last): File "exp/tools/result2ann_obb.py", line 4, in from mmdet.core.bbox import bbox_overlaps File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmdet/init.py", line 25, in f'MMCV=={mmcv.version} is used but incompatible. ' \ AssertionError: MMCV==1.7.2 is used but incompatible. Please install mmcv>=1.3.2, <=1.4.0.

zzhhzz666 commented 2 months ago

Traceback (most recent call last): File "tools/train.py", line 273, in main() File "tools/train.py", line 234, in main test_cfg=cfg.get('test_cfg')) File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/builder.py", line 58, in build_detector cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg f'{obj_type} is not in the {registry.name} registry') KeyError: 'PointOBB is not in the models registry' loading annotations into memory... Done (t=0.98s) creating index... index created! Loading and preparing results... Traceback (most recent call last): File "exp/tools/result2ann_obb.py", line 30, in res = coco.loadRes(args.det_file) File "/home/zhuzhenhua/.local/lib/python3.7/site-packages/pycocotools/coco.py", line 319, in loadRes with open(resFile) as f: FileNotFoundError: [Errno 2] No such file or directory: '/home/zhuzhenhua/pointobb-main/PointOBB/work_dir/test_pointobb_r50_fpn_2x_dior/pseudo_obb_result.json'

Luo-Z13 commented 2 months ago

Traceback (most recent call last): File "tools/train.py", line 273, in main() File "tools/train.py", line 234, in main test_cfg=cfg.get('test_cfg')) File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmdet/models/builder.py", line 58, in build_detector cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build return self.build_func(*args, **kwargs, registry=self) File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg f'{obj_type} is not in the {registry.name} registry') KeyError: 'PointOBB is not in the models registry' loading annotations into memory... Done (t=0.98s) creating index... index created! Loading and preparing results... Traceback (most recent call last): File "exp/tools/result2ann_obb.py", line 30, in res = coco.loadRes(args.det_file) File "/home/zhuzhenhua/.local/lib/python3.7/site-packages/pycocotools/coco.py", line 319, in loadRes with open(resFile) as f: FileNotFoundError: [Errno 2] No such file or directory: '/home/zhuzhenhua/pointobb-main/PointOBB/work_dir/test_pointobb_r50_fpn_2x_dior/pseudo_obb_result.json'

你好,这个看起来是没有成功生成对应伪标签文件的问题,你可以再次用训练好的权重(如ep22)推理一遍

Luo-Z13 commented 2 months ago

我将batch_size从2设置成了1,可以跑完24个epoch了,但是train的结果不太理想,map最好为0.386。所以想按照您给的设置,跑完24个epoch,但是会出现我前面提到的memory一直上升,我第一次跑10个epoch就断了,按照您说的延续checkpoint继续训练,第二次到22个epoch,然后23就彻底跑不了了,我看前面您23年回答过别人您用的mmcv版本是1.4.5,但是本文给的readme用的不是mmcv-full嘛,本来想再创建个环境,由于用的这个服务器磁盘空间满了,所以创建不了,为了保险才来问您。如有打扰,请见谅!望事事顺利!

对,readme里会安装mmcv-full,对于

File "/home/zhuzhenhua/.conda/envs/open-mmlab/lib/python3.7/site-packages/mmdet/init.py", line 25, in
f'MMCV=={mmcv.version} is used but incompatible. '
AssertionError: MMCV==1.7.2 is used but incompatible. Please install mmcv>=1.3.2, <=1.4.0.

你可以注释仓库里对应的mmcv版本限制,我的mmcv-full版本是1.4.5

33ff7a4ecc667e8ff72a576be9786ad
Luo-Z13 commented 2 months ago

我将batch_size从2设置成了1,可以跑完24个epoch了,但是train的结果不太理想,map最好为0.386。所以想按照您给的设置,跑完24个epoch,但是会出现我前面提到的memory一直上升,我第一次跑10个epoch就断了,按照您说的延续checkpoint继续训练,第二次到22个epoch,然后23就彻底跑不了了,我看前面您23年回答过别人您用的mmcv版本是1.4.5,但是本文给的readme用的不是mmcv-full嘛,本来想再创建个环境,由于用的这个服务器磁盘空间满了,所以创建不了,为了保险才来问您。如有打扰,请见谅!望事事顺利!

在我目前可运行的环境中,没有装mmcv,可能你需要调整一下

zzhhzz666 commented 2 months ago

我将batch_size从2设置成了1,可以跑完24个epoch了,但是train的结果不太理想,map最好为0.386。所以想按照您给的设置,跑完24个epoch,但是会出现我前面提到的memory一直上升,我第一次跑10个epoch就断了,按照您说的延续checkpoint继续训练,第二次到22个epoch,然后23就彻底跑不了了,我看前面您23年回答过别人您用的mmcv版本是1.4.5,但是本文给的readme用的不是mmcv-full嘛,本来想再创建个环境,由于用的这个服务器磁盘空间满了,所以创建不了,为了保险才来问您。如有打扰,请见谅!望事事顺利!

在我目前可运行的环境中,没有装mmcv,可能你需要调整一下

佬,您好!mmcv-full版本的问题我暂时没有什么问题了,我用1.4.0还可以,然后又安装了您的1.4.5,注销了提示。但是现在运行推理指令sh test_p.sh 会出现附图的问题.老是在这里提问,觉得很很影响您。您的邮件肯定很忙,半天出来一个我的问题,应该会影响您吧。如果可以的话,我能加您一个联系方式吗,我的微信是hua15255859221,麻烦您啦!祝您的科研顺利! 微信图片_20240718154114