Hello, author, why is the training time so long, even several days? But there are only a few hundred pictures in my dataset. Do you have a solution?

weijinmeng commented 2 years ago

chenhang98 commented 2 years ago

You can check whether it is because there are too many instances in the coarse segmentation result. It takes about 10 hours for training on 4x 2080Ti GPUs on the cityscapes dataset (2000+ images), for your reference.

weijinmeng commented 2 years ago

Thank you for your reply. My coarse segmentation contains many instances, which may cause the slow training.

------------------ 原始邮件 ------------------ 发件人: "tinyalpha/BPR" @.>; 发送时间: 2021年11月1日(星期一) 晚上8:49 @.>; @.**@.>; 主题: Re: [tinyalpha/BPR] Hello, author, why is the training time so long, even several days? But there are only a few hundred pictures in my dataset. Do you have a solution? (Issue #23)

You can check whether it is because there are too many instances in the coarse segmentation result. It takes about 10 hours for training on 4x 2080Ti GPUs on the cityscapes dataset (2000+ images), for your reference.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

weijinmeng commented 2 years ago

Dear author, I have the following problems in the process of inference. I can't find the hrnet18s_coco-c172955f.pth file. And, don't generate mask_rcnn_r50.val.refined.json/refined.pkl file. ‍ subprocess.CalledProcessError: Command '['/icislab/volume4/wjm/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/test_float.py', '--local_rank=3', 'configs/bpr/hrnet18s_128.py', 'hrnet18s_coco-c172955f.pth', '--launcher', 'pytorch', '--out', 'mask_rcnn_r50.val.refined.json/refined.pkl']' returned non-zero exit status 1.‍

------------------ 原始邮件 ------------------ 发件人: "tinyalpha/BPR" @.>; 发送时间: 2021年11月1日(星期一) 晚上8:49 @.>; @.**@.>; 主题: Re: [tinyalpha/BPR] Hello, author, why is the training time so long, even several days? But there are only a few hundred pictures in my dataset. Do you have a solution? (Issue #23)

You can check whether it is because there are too many instances in the coarse segmentation result. It takes about 10 hours for training on 4x 2080Ti GPUs on the cityscapes dataset (2000+ images), for your reference.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

chenhang98 commented 2 years ago

You can find the pre-trained model here.

weijinmeng commented 2 years ago

thank you for your reply!

------------------ 原始邮件 ------------------ 发件人: "tinyalpha/BPR" @.>; 发送时间: 2021年11月3日(星期三) 晚上8:06 @.>; @.**@.>; 主题: Re: [tinyalpha/BPR] Hello, author, why is the training time so long, even several days? But there are only a few hundred pictures in my dataset. Do you have a solution? (Issue #23)

You can find the pre-trained model here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

weijinmeng commented 2 years ago

I'm sorry to disturb you many times! I apply it to a COCO like dataset. After the training, the following errors occurred in the inference stage. I look forward to your reply, which is very important to me. thank you!!

Traceback (most recent call last): File "./tools/test_float.py", line 148, in <module> main() File "./tools/test_float.py", line 117, in main checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu') File "/icislab/volume4/wjm/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 247, in load_checkpoint checkpoint = _load_checkpoint(filename, map_location) File "/icislab/volume4/wjm/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/checkpoint.py", line 222, in _load_checkpoint raise IOError(f'{filename} is not a checkpoint file') OSError: hrnet18s_coco-c172955f.pth is not a checkpoint file

loading annotations into memory... Done (t=0.02s) creating index... index created! Traceback (most recent call last): File "./tools/merge_patches.py", line 104, in <module> start() File "./tools/merge_patches.py", line 63, in start with open(args.res_pkl, 'rb') as f: FileNotFoundError: [Errno 2] No such file or directory: 'mask_rcnn_r50.val.refined.json/refined.pkl'

------------------ 原始邮件 ------------------ 发件人: "tinyalpha/BPR" @.>; 发送时间: 2021年11月3日(星期三) 晚上8:06 @.>; @.**@.>; 主题: Re: [tinyalpha/BPR] Hello, author, why is the training time so long, even several days? But there are only a few hundred pictures in my dataset. Do you have a solution? (Issue #23)

You can find the pre-trained model here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

chenhang98 commented 2 years ago

You can check whether the file is crroupted during download. The md5sum should be af557618d1f1b2e861e79d9d01e2f180.

SchernHe commented 2 years ago

I get the same error. Any news on the issue? Moreover, can somebody explain to me why I need the ground truth for the refinement?

weijinmeng commented 2 years ago

I tried to train with a single GPU, and the training time was reduced to 10 hours.

------------------ 原始邮件 ------------------ 发件人: "tinyalpha/BPR" @.>; 发送时间: 2022年1月21日(星期五) 凌晨0:30 @.>; @.**@.>; 主题: Re: [tinyalpha/BPR] Hello, author, why is the training time so long, even several days? But there are only a few hundred pictures in my dataset. Do you have a solution? (Issue #23)

I get the same error. Any news on the issue? Moreover, can somebody explain to me why I need the ground truth for the refinement?

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

chenhang98 / BPR

Hello, author, why is the training time so long, even several days? But there are only a few hundred pictures in my dataset. Do you have a solution? #23