cfzd / Ultra-Fast-Lane-Detection-v2

Ultra Fast Deep Lane Detection With Hybrid Anchor Driven Ordinal Classification (TPAMI 2022)
MIT License
609 stars 100 forks source link

Training Error #192

Closed supportman007 closed 1 week ago

supportman007 commented 1 week ago

(lane-det) root@DESKTOP-F744S5D:/home/project1/Ultra-Fast-Lane-Detection-V2# python train.py configs/tusimple_res34.py --log_path /home/project1/ /root/miniconda3/envs/lane-det/lib/python3.7/site-packages/nvidia/dali/backend.py:57: Warning: DALI 1.31 is the last release to support Python 3.7. Please update your environment to use Python 3.8, 3.9, 3.10, or (experimentally) 3.11. deprecation_warning("DALI 1.31 is the last release to support Python 3.7. " merge log_path config rm: cannot remove '.work_dir_tmp_file.txt': No such file or directory [2024/11/01 22:04:01] start training... Config (path: configs/tusimple_res34.py): {'dataset': 'Tusimple', 'data_root': '/home/project1/TUSimple/train_set/', 'epoch': 100, 'batch_size': 32, 'optimizer': 'SGD', 'learning_rate': 0.05, 'weight_decay': 0.0001, 'momentum': 0.9, 'scheduler': 'multi', 'steps': [50, 75], 'gamma': 0.1, 'warmup': 'linear', 'warmup_iters': 100, 'backbone': '34', 'griding_num': 100, 'use_aux': False, 'sim_loss_w': 0.0, 'shp_loss_w': 0.0, 'note': '', 'log_path': '/home/project1/', 'finetune': None, 'resume': None, 'test_model': '', 'test_work_dir': '/home/project1/20241101_220401_lr_5e-02_b_32', 'num_lanes': 4, 'var_loss_power': 2.0, 'auto_backup': True, 'num_row': 56, 'num_col': 41, 'train_width': 800, 'train_height': 320, 'num_cell_row': 100, 'num_cell_col': 100, 'mean_loss_w': 0.05, 'fc_norm': False, 'soft_loss': True, 'cls_loss_col_w': 1.0, 'cls_ext_col_w': 1.0, 'mean_loss_col_w': 0.05, 'eval_mode': 'normal', 'crop_ratio': 0.8, 'row_anchor': array([0.22222222, 0.23611111, 0.25 , 0.26388889, 0.27777778, 0.29166667, 0.30555556, 0.31944444, 0.33333333, 0.34722222, 0.36111111, 0.375 , 0.38888889, 0.40277778, 0.41666667, 0.43055556, 0.44444444, 0.45833333, 0.47222222, 0.48611111, 0.5 , 0.51388889, 0.52777778, 0.54166667, 0.55555556, 0.56944444, 0.58333333, 0.59722222, 0.61111111, 0.625 , 0.63888889, 0.65277778, 0.66666667, 0.68055556, 0.69444444, 0.70833333, 0.72222222, 0.73611111, 0.75 , 0.76388889, 0.77777778, 0.79166667, 0.80555556, 0.81944444, 0.83333333, 0.84722222, 0.86111111, 0.875 , 0.88888889, 0.90277778, 0.91666667, 0.93055556, 0.94444444, 0.95833333, 0.97222222, 0.98611111]), 'col_anchor': array([0. , 0.025, 0.05 , 0.075, 0.1 , 0.125, 0.15 , 0.175, 0.2 , 0.225, 0.25 , 0.275, 0.3 , 0.325, 0.35 , 0.375, 0.4 , 0.425, 0.45 , 0.475, 0.5 , 0.525, 0.55 , 0.575, 0.6 , 0.625, 0.65 , 0.675, 0.7 , 0.725, 0.75 , 0.775, 0.8 , 0.825, 0.85 , 0.875, 0.9 , 0.925, 0.95 , 0.975, 1. ]), 'distributed': False} loading cached data cached data loaded Traceback (most recent call last): File "train.py", line 82, in train_loader = get_train_loader(cfg) File "/home/project1/Ultra-Fast-Lane-Detection-V2/utils/common.py", line 189, in get_train_loader cfg.row_anchor, cfg.col_anchor, cfg.train_width, cfg.train_height, cfg.num_cell_row, cfg.num_cell_col, cfg.dataset, cfg.crop_ratio) File "/home/project1/Ultra-Fast-Lane-Detection-V2/data/dali_data.py", line 229, in init self.pii = DALIGenericIterator(pipe, output_map = ['images', 'seg_images', 'points'], last_batch_padded=True, last_batch_policy=LastBatchPolicy.PARTIAL) File "/root/miniconda3/envs/lane-det/lib/python3.7/site-packages/nvidia/dali/plugin/pytorch.py", line 189, in init prepare_first_batch=prepare_first_batch) File "/root/miniconda3/envs/lane-det/lib/python3.7/site-packages/nvidia/dali/plugin/base_iterator.py", line 198, in init p.build() File "/root/miniconda3/envs/lane-det/lib/python3.7/site-packages/nvidia/dali/pipeline.py", line 852, in build self._pipe.Build(self._generate_build_args()) RuntimeError: Critical error when building pipeline: Error when constructing operator: decodersImage, instance name: "Image_4", encountered: Error in thread 0: nvml error (3): The nvml requested operation is not available on target device Current pipeline object is no longer valid.

supportman007 commented 1 week ago

I tried to export DALI_DISABLE_NVML=1 and it worked. It's about missing functionality in NVML for WSL2.