cfzd / Ultra-Fast-Lane-Detection

Ultra Fast Structure-aware Deep Lane Detection (ECCV 2020)
MIT License
1.82k stars 493 forks source link

LinAlgError("SVD did not converge in Linear Least Squares") #136

Closed yushuntang closed 3 years ago

yushuntang commented 3 years ago

[2020/11/28 00:59:02] start training... Config (path: H:\configs\tusimple.py): {'dataset': 'Tusimple', 'data_root': 'H:/$TUSIMPLEROOT', 'epoch': 100, 'batch_size': 32, 'optimizer': 'Adam', 'learning_rate': 0.0004, 'weight_decay': 0.0001, 'momentum': 0.9, 'scheduler': 'cos', 'gamma': 0.1, 'warmup': 'linear', 'warmup_iters': 100, 'backbone': '34', 'griding_num': 100, 'use_aux': True, 'sim_loss_w': 1.0, 'shp_loss_w': 0.0, 'note': '', 'log_path': 'H:log_path', 'finetune': None, 'resume': None, 'test_model': 'tusimple_18.pth', 'test_work_dir': None, 'num_lanes': 4} 114 0%| | 0/114 [00:00<?, ?it/s] Traceback (most recent call last): File "H:\Python\Anaconda\lib\site-packages\tqdm\std.py", line 1171, in iter for obj in iterable: File "H:\Python\Anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next data = self._next_data() File "H:\Python\Anaconda\lib\site-packages\torch\utils\data\dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "H:\Python\Anaconda\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "H:\Python\Anaconda\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "H:\data\dataset.py", line 75, in getitem lane_pts = self._get_index(label) File "H:\data\dataset.py", line 156, in _get_index p = np.polyfit(valid_idx_half[:,0], valid_idx_half[:,1],deg = 1) File "<__array_function__ internals>", line 6, in polyfit File "C:\Users\Thomson\AppData\Roaming\Python\Python36\site-packages\numpy\lib\polynomial.py", line 629, in polyfit c, resids, rank, s = lstsq(lhs, rhs, rcond) File "<__array_function__ internals>", line 6, in lstsq File "C:\Users\Thomson\AppData\Roaming\Python\Python36\site-packages\numpy\linalg\linalg.py", line 2306, in lstsq x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj) File "C:\Users\Thomson\AppData\Roaming\Python\Python36\site-packages\numpy\linalg\linalg.py", line 100, in _raise_linalgerror_lstsq raise LinAlgError("SVD did not converge in Linear Least Squares") numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares 你好!请问这个错误是什么原因,网上找到是np.polyfit的参数有存在NAN值,实际没有NAN啊,谢谢!

cfzd commented 3 years ago

@ThomsonTang6 或许你可以看一下这个issue: #134 还有我看好像说numpy和scipy的版本问题也可能会导致这个问题,或许可以尝试一下升级一下numpy或者scipy?

pip install numpy --upgrade
pip install scipy --upgrade
yushuntang commented 3 years ago

谢谢!已是最新版本numpy与scipy,并且尝试过不同版本numpy都是这个error,debug发现不是所有的valid_idx_half会报错,把报错的valid_idx_half拿出来单独 np.polyfit又是正常不会报错。

cfzd commented 3 years ago

@ThomsonTang6 如果不好复现,只是想看一下训练结果的话,可以暂时把这部分给去掉。出错的这部分代码是负责把车道线给延长到图像边界的(做完数据增强后图像边界会有黑边,我们希望车道线具有延长的属性,所以这部分也要处理),可以直接在第132行直接返回all_idx. https://github.com/cfzd/Ultra-Fast-Lane-Detection/blob/60f477c7358bbe177e1117b9f229a4a4b0db0e73/data/dataset.py#L132-L134 直接返回all_idx,不计算后面的延长部分了

return all_idx

如果要解决这个问题的话,要用之前的那个代码片段

isf = np.isfinite(valid_idx_half)
if not np.all(isf):
    print("nan or inf")

把出错的数据给抓出来,然后分析什么问题了。如果你有出错的数据什么的,欢迎report。

yushuntang commented 3 years ago

谢谢您的回复!我用另一个环境解决了这个问题。 另外请问,1、我改了tusimple.py的backbone = '34',为啥训练出的模型跟您提供的tusimple_18的一模一样,对于res-34有其他需要修改的地方吗? 2、对于系数设置sim_loss_w = 1.0;shp_loss_w = 0.0。这样忽略了论文中的那个直线差分shp_loss,其他的系数都为1,请问您有加入shp_loss或者设置成其他系数组合,效果有更好吗? 谢谢!