Closed DheerajYarlagadda closed 2 years ago
You should run this code on a PC that has more VRAM
At least 12GiB VRAM for training, 256M is impossible for training any SOTA performance model
Yes, it is clear now. first I used VRAM with a total capacity of 10.91GiB, which caused this error. Now when running on a higher capacity one it is working fine!
Did anyone face the same issue as below?
Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/-/BtcDet/btcdet/datasets/multifindbestfit.py", line 494, in
find_best_match_boxpnts(all_db_infos_lst, box_dims_lst, sorted_iou, pnt_thresh_best_iou_indices, mirrored_pnts_lst, pnts_lst, coords_num, occ_map, bm_dir_save_path, allrange, nx, ny, voxel_size, max_num_bm=max_num_bm_lst[i], num_extra_coords=num_extra_coords_lst[i], iou_thresh=iou_thresh_lst[i], ex_coords_ratio=ex_coords_ratio_lst[i], nearest_dist=nearest_dist_lst[i], vis=vis, save=save)
File "/home/-/BtcDet/btcdet/datasets/multifindbestfit.py", line 330, in find_best_match_boxpnts
bm_pnts, bm_coords_num = find_multi_best_match_boxpnts(selected_sorted_iou, cur_box, cur_mirrored_pnts_lst, cur_pnts_lst, selected_mirrored_pnts_lst, selected_pnts_lst, selected_pnt_thresh_best_iou_indices, cur_occ_map, selected_occ_map, max_num_bm=max_num_bm, num_extra_coords=num_extra_coords, iou_thresh=iou_thresh, ex_coords_ratio=ex_coords_ratio, nearest_dist=nearest_dist, vis=vis)
File "/home/-/BtcDet/btcdet/datasets/multifindbestfit.py", line 375, in find_multi_best_match_boxpnts
mean_instance, min_instance, max_instance = get_batch_stats(dist_l1, box_num_pnts_tensor, box_mask_tensor, box_reversemask_tensor)
File "/home/-/BtcDet/btcdet/datasets/multifindbestfit.py", line 351, in get_batch_stats
addmax_dist = masked_dist - reversemask_arry
RuntimeError: CUDA out of memory. Tried to allocate 198.00 MiB (GPU 0; 10.92 GiB total capacity; 8.94 GiB already allocated; 128.69 MiB free; 9.86 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF