Closed wsy588 closed 2 years ago
Are you still using coco dataset or your personal dataset with 10 categories?
Are you still using coco dataset or your personal dataset with 10 categories?
I still use coco dataset. Because I think there are some categories I don't need in coco dataset and 171 categories are hard for me to train.
So your label file has category from 0-171,, but your model category is only 10, which is the cause of your problem. You can change the label values of categories you do not care into 255 which will be ignored.
So your label file has category from 0-171,, but your model category is only 10, which is the cause of your problem. You can change the label values of categories you do not care into 255 which will be ignored.
Thank you for your reply to my question. Should I change coco.py
like this?
class CocoStuff(BaseDataset):
def __init__(self, dataroot, annpath, trans_func=None, mode='train'):
super(CocoStuff, self).__init__(
dataroot, annpath, trans_func, mode)
self.n_cats = 10 # 91 stuff, 91 thing, 11 of thing have no annos
self.lb_ignore = 255
## label mapping, remove non-existing labels
missing = [11, 25, 28, 29, 44, 65, 67, 68, 70, 82, 90]
remain = [ind for ind in range(182) if not ind in missing]
self.lb_map = np.arange(256)
for ind in remain:
if ind > 9:
self.lb_map[ind] =255
else:
self.lb_map[ind] = remain.index(ind)
self.to_tensor = T.ToTensor(
mean=(0.46962251, 0.4464104, 0.40718787), # coco, rgb
std=(0.27469736, 0.27012361, 0.28515933),
)
But when I train BiSeNetv2 with coco, the result is as below. The loss is NAN.
iter: 100/180000, lr: 0.003454, eta: 3:42:53, time: 7.51, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
iter: 200/180000, lr: 0.004348, eta: 3:30:10, time: 6.59, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
iter: 300/180000, lr: 0.005474, eta: 3:25:49, time: 6.59, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
iter: 400/180000, lr: 0.006892, eta: 3:23:39, time: 6.60, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
iter: 500/180000, lr: 0.008676, eta: 3:22:14, time: 6.59, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
iter: 600/180000, lr: 0.010923, eta: 3:21:34, time: 6.65, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
iter: 700/180000, lr: 0.013751, eta: 3:20:54, time: 6.61, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
iter: 800/180000, lr: 0.017311, eta: 3:20:28, time: 6.63, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
iter: 900/180000, lr: 0.021794, eta: 3:20:03, time: 6.62, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
iter: 1000/180000, lr: 0.027437, eta: 3:19:47, time: 6.65, loss: nan, loss_pre: nan, loss_aux0: nan, loss_aux1: nan, loss_aux2: nan, loss_aux3: nan
Maybe you have too much ignored labels. There are 171 categories, but you ignored 161 of them. If you do not really care about category meanings, you can merge them rather than ignore them.
Maybe you have too much ignored labels. There are 171 categories, but you ignored 161 of them. If you do not really care about category meanings, you can merge them rather than ignore them.
I increase the category and the loss becomes normal. Thanks a lot!
hi: Thanks for your great work. When I train BiSeNetv2 with COCOStuff there is no problem. But when I change the number of categories from 171 to 10, I meet
RuntimeError: CUDA error: an illegal memory access was encountered
. And the Traceback is as follows:I change the
n_cats
inconfig/bisenetv2_coco.py
as follows:And I change
self.n_cats
andremain
inlib/data/coco.py
as follows:My docker environment is: ubuntu18.04 RTX 3060 Driver Version: 510.73.05 pytorch 1.11.0 cuda 11.3 cudnn 8 python 3.8
Thanks a lot if anyone can help me.