aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
https://git.io/AdelaiDet
Other
3.39k stars 652 forks source link

ValueError: cannot convert float NaN to integer #272

Open Wang-Qinyu opened 3 years ago

Wang-Qinyu commented 3 years ago
Traceback (most recent call last):
  File "tools/train_net.py", line 243, in <module>
    args=(args,),
  File "/home/wangqinyu/detectron2/detectron2/engine/launch.py", line 59, in launch
    daemon=False,
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
    while not context.join():
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 119, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
    fn(i, *args)
  File "/home/wangqinyu/detectron2/detectron2/engine/launch.py", line 94, in _distributed_worker
    main_func(*args)
  File "/home/wangqinyu/AdelaiDet/tools/train_net.py", line 231, in main
    return trainer.train()
  File "/home/wangqinyu/AdelaiDet/tools/train_net.py", line 113, in train
    self.train_loop(self.start_iter, self.max_iter)
  File "/home/wangqinyu/AdelaiDet/tools/train_net.py", line 102, in train_loop
    self.run_step()
  File "/home/wangqinyu/detectron2/detectron2/engine/train_loop.py", line 229, in run_step
    data = next(self._data_loader_iter)
  File "/home/wangqinyu/detectron2/detectron2/data/common.py", line 142, in __iter__
    for d in self.dataset:
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
    return self._process_data(data)
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
    data.reraise()
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/wangqinyu/anaconda3/envs/Bezier/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/wangqinyu/detectron2/detectron2/data/common.py", line 41, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/home/wangqinyu/AdelaiDet/adet/data/dataset_mapper.py", line 130, in __call__
    transforms = aug_input.apply_augmentations(self.augmentation)
  File "/home/wangqinyu/detectron2/detectron2/data/transforms/augmentation.py", line 347, in apply_augmentations
    return AugmentationList(augmentations)(self)
  File "/home/wangqinyu/detectron2/detectron2/data/transforms/augmentation.py", line 264, in __call__
    tfm = x(aug_input)
  File "/home/wangqinyu/detectron2/detectron2/data/transforms/augmentation.py", line 165, in __call__
    tfm = self.get_transform(*args)
  File "/home/wangqinyu/detectron2/detectron2/data/transforms/augmentation_impl.py", line 183, in get_transform
    neww = int(neww + 0.5)
ValueError: cannot convert float NaN to integer

This error is thrown when I run 60 rounds,then I modify the learning rate to 0.01,0.001 and 0.000001,this same error still occurs.

xxAna commented 3 years ago

same problems. Waiting for reply.~

anruirui commented 3 years ago

@xxAna same problems, can you give me some help, thank you.

blue-zircon commented 2 years ago

@Wang-Qinyu , @xxAna , @anruirui were you able to solve this issue?

Gorgerbin commented 1 year ago

@blue-zircon @xxAna @Wang-Qinyu @anruirui did you solve this problem? can you give me a hand please?