Closed Hikaylee closed 1 year ago
@YuHengsss 希望您能帮忙看一下 谢谢
hello, 这个 iteration 的某个 img 应该没读到,可以查看一下它的输入情况。 我怀疑是部分图像缺失的问题
那请问一下运行python tools/val_to_imdb.py 来验证的时候,为什么运行过程中电脑内存占用会一直在增长,每次我都还没运行完所有(555个)循环就因为内存满了导致中断
因为predictions 的总数太多啦,32G 的内存在我们的测试环境中有时也会炸内存
因为predictions 的总数太多啦,32G 的内存在我们的测试环境中有时也会炸内存
可以通过修改confidence threshold 或者加大内存缓解,不过前者有会有略微掉点
因为predictions 的总数太多啦,32G 的内存在我们的测试环境中有时也会炸内存
可以通过修改confidence threshold 或者加大内存缓解,不过前者有会有略微掉点
我本来尝试了每次循环都把结果用pickle.dump((res[0], res[1]), file_writter)写入,而不是最后append一次性写入,但发现这样最后得到的pickle文件大小比原来将近大了10倍,但逻辑上我没有找到问题,是因为pickle.dump会压缩数据吗?
是不是忘记转格式了?
别的地方都没变,就只在最后改了一下写入的顺序: for ele in res: cur_iter += 1 if cur_iter % 10 == 0: print(str(cur_iter) + '/' + str(len(res))) first_frame = ele[0][0] video_name = first_frame[first_frame.find('val'):first_frame.rfind('/')]
preds_video = {}
**repp_res = []**
for frames in ele:
# frames
if frames == []: continue
tmp_imgs = []
for img in frames:
img = cv2.imread(os.path.join(exp.data_dir, img))
height, width = img.shape[:2]
ratio = min(predictor.test_size[0] / img.shape[0], predictor.test_size[1] / img.shape[1])
img, _ = predictor.preproc(img, None, predictor.test_size)
img = torch.from_numpy(img)
tmp_imgs.append(img)
imgs = torch.stack(tmp_imgs)
pred_res = predictor.inference(imgs)
del imgs
for pred, img_name in zip(pred_res, frames):
point_idx = img_name.rfind('.')
image_id = img_name[img_name.find('val'):point_idx]
img_idx = img_name[img_name.rfind('/') + 1:point_idx]
det_repp = predictor.to_repp_heavy(pred, ratio, [height, width], image_id)
preds_video[img_idx] = det_repp
**repp_res = [video_name, preds_video]
pickle.dump((repp_res [0], repp_res [1]), file_writter)
file_writter.close()**
您好,请问一下为什么每个epoch训练都会报以下这个错误然后训练就被中断,而且每次都是训练到7020iter的时候。
2023-03-04 06:39:27.342 | INFO | yolox.core.vid_trainer:after_iter:279 - epoch: 1/7, iter: 7020/9366, mem: 8055Mb,`iter_time: 4.363s, data_time: 3.605s, total_loss: 1.1, iou_loss: 0.7, l1_loss: 0.0, conf_loss: 0.2, cls_loss: 0.1, lr: 2.247e-03, size: 480, ETA: 2 days, 21:48:48 2023-03-04 06:39:38.261 | INFO | yolox.core.vid_trainer:after_train:198 - Training of experiment is done and the best AP is 0.00 2023-03-04 06:39:38.262 | ERROR | yolox.core.launch:launch:98 - An error has been caught in function 'launch', process 'MainProcess' (267170), thread 'MainThread' (140410779206464): Traceback (most recent call last):
File "tools/vid_train.py", line 151, in
args=(exp, args),
│ └ Namespace(batch_size=128, cache=False, ckpt='/media/user/A0F260D9F260B566/qsy/YOLOV/weights/yoloxsvid.pth', devices=1, dist...
└ ╒═══════════════════╤════════════════════════════════════════════════════════════════════════════════════════════════════════...
File "tools/vid_train.py", line 128, in main trainer.train() │ └ <function Trainer.train at 0x7fb305ae15f0> └ <yolox.core.vid_trainer.Trainer object at 0x7fb3ed55a990>
File "./yolox/core/vid_trainer.py", line 85, in train self.train_in_epoch() │ └ <function Trainer.train_in_epoch at 0x7fb305ae1b90> └ <yolox.core.vid_trainer.Trainer object at 0x7fb3ed55a990>
File "./yolox/core/vid_trainer.py", line 94, in train_in_epoch self.train_in_iter() │ └ <function Trainer.train_in_iter at 0x7fb305ae1dd0> └ <yolox.core.vid_trainer.Trainer object at 0x7fb3ed55a990>
File "./yolox/core/vid_trainer.py", line 100, in train_in_iter self.train_one_iter() │ └ <function Trainer.train_one_iter at 0x7fb305ae4d40> └ <yolox.core.vid_trainer.Trainer object at 0x7fb3ed55a990>
File "./yolox/core/vid_trainer.py", line 107, in train_one_iter inps = inps.to(self.data_type) │ │ └ torch.float16 │ └ <yolox.core.vid_trainer.Trainer object at 0x7fb3ed55a990> └ None
AttributeError: 'NoneType' object has no attribute 'to'