CoinCheung / BiSeNet

Add bisenetv2. My implementation of BiSeNet
MIT License
1.45k stars 309 forks source link

训练的时候报错tools/train_amp.py FAILED #313

Closed yangaiping closed 1 year ago

yangaiping commented 1 year ago

我按照作者您提供的训练执行命令·: export CUDA_VISIBLE_DEVICES=0 NGPUS=1 cfg_file=configs/bisenetv2_coco.py torchrun --nproc_per_node=$NGPUS tools/train_amp.py --config $cfg_file 遇到了如下报错问题: image

CoinCheung commented 1 year ago

Hi,

Would you please show me the full error message?

yangaiping commented 1 year ago

好的,谢谢您,这是完整报错信息 image image

CoinCheung commented 1 year ago

Have you specified dataset correctly?

yangaiping commented 1 year ago

您好,这是我的数据集文件 image image

CoinCheung commented 1 year ago

Did you generate train.txt with the method in README.txt

yangaiping commented 1 year ago

我刚刚发现当我执行这个代码时会生成train.txt和val.txt,但是这两个文件是空的 image

yangaiping commented 1 year ago

image

CoinCheung commented 1 year ago

What is in the folder of images and labels?

yangaiping commented 1 year ago

image image

CoinCheung commented 1 year ago

Why are these label files in format of txt? image

Are you using coco-stuff dataset?

yangaiping commented 1 year ago

我用的是coco2017labels-segments.zip数据集

yangaiping commented 1 year ago

我可能已经发现了我的问题,我再试试coco-stuff dataset数据集

yangaiping commented 1 year ago

我已经使用了正确的数据集,并且成功划分数据集,但是训练的时候仍然报错,请问这是为什么呢 image image

CoinCheung commented 1 year ago

Seems that you have hidden files in your image/train2017 folder, and likely in your train.txt file, would you have a check of this?

yangaiping commented 1 year ago

我把隐藏的文件.ipynb_checkpoints删掉了重新执行python tools/gen_dataset_annos.py --dataset coco,然后再执行下列命令,又出现了一个新的错误 image image

CoinCheung commented 1 year ago
python tools/check_dataset_info.py --im_root datasets/coco --im_anns datasets/coco/train.txt

What is the output of this?

yangaiping commented 1 year ago

image image

CoinCheung commented 1 year ago

Why does your coco-stuff has 201 categories? I used coco-stuff with only 171 classes.

You can change n_cat in the config file into 202, if you would like to use your dataset.

yangaiping commented 1 year ago

谢谢您,可能是我的数据集问题,我改成202后重新执行训练命令后好像成功运行了,但是我还想问怎么修改iter: 400/180000,感觉180000很大,怎么调小这个参数呢 image

CoinCheung commented 1 year ago

You should modify in the config file https://github.com/CoinCheung/BiSeNet/blob/f2b901599752ce50656d2e50908acecd06f7eb47/configs/bisenetv2_coco.py#L10

yangaiping commented 1 year ago

真的非常感谢您,感谢您耐心地解答我的问题,再次感谢!