KeyError: '__getstate__' when running train.py

quangdaist01 commented 2 years ago

Hello, I'm very interested in your work. However, when I try to run train.py, I got an error. I don't know why it happenned and how to fix it? Any suggestions? Thank you. Here is the full traceback:

target fixation prob (valid).: [0.         0.00019841 0.00019841 0.00019841 0.00019841 0.00039683
 0.00039683]
 {
  Data {
    im_w: 512
    im_h: 320
    patch_num: [32, 20]
    patch_size: [2, 2]
    patch_count: 640
    fovea_radius: 2
    IOR_size: 1
    max_traj_length: 6
  }
  Train {
    gamma: 0.9
    adv_est: GAE
    exclude_wrong_trials: False
    tau: 0.96
    batch_size: 128
    stop_criteria: SOT
    log_root: ./assets
    num_epoch: 30
    num_step: 4
    checkpoint_every: 100
    max_checkpoints: 5
    evaluate_every: 20
    num_critic: 1
    gail_milestones: [10000]
    gail_lr: 5e-05
    adam_betas: [0.9, 0.999]
  }
  PPO {
    lr: 1e-05
    clip_param: 0.2
    num_epoch: 1
    batch_size: 64
    value_coef: 1.0
    entropy_coef: 0.01
  }
}
Traceback (most recent call last):
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\train.py", line 60, in <module>
Traceback (most recent call last):
  File "<string>", line 1, in <module>
    trainer.train()
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\trainer.py", line 89, in train
    for i_batch, batch in enumerate(self.train_img_loader):
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\dataloader.py", line 368, in __iter__
  File "C:\Users\MSI I5\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    return self._get_iterator()
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\dataloader.py", line 314, in _get_iterator
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\MSI I5\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 126, in _main
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\dataloader.py", line 927, in __init__
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
    w.start()
  File "C:\Users\MSI I5\AppData\Local\Programs\Python\Python310\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "C:\Users\MSI I5\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\MSI I5\AppData\Local\Programs\Python\Python310\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)
  File "C:\Users\MSI I5\AppData\Local\Programs\Python\Python310\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\MSI I5\AppData\Local\Programs\Python\Python310\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\config.py", line 53, in __getattr__
    return super().__getitem__(attr)
KeyError: '__getstate__'

ouyangzhibo commented 2 years ago

Hi,

This might be caused by python version mismatch. You can try to upgrade your python version >=3.7 and see if it solves your problem. Please also double check if the input <hparam> file is correct.

In addition, I think you directly fed the model with the original COCO-Search18 gaze data which was collected on a 1680x1050 display. In the paper we rescaled the images to 512x320 as well as the fixation locations. So you need to rescale the fixation coordinates first. Please find the rescaled fixations here.

quangdaist01 commented 2 years ago

Thank you for your suggestions. The problem is solved by changing the num_worker from 16 to 0 (Maybe because I run on cpu instead of gpu?). The code continued to run and then stumbled on another error. Is this error related to the scaling problem?

cat_name:  cup
img_name:  000000546494.jpg
Traceback (most recent call last):
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\train.py", line 60, in <module>
    trainer.train()
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\trainer.py", line 89, in train
    for i_batch, batch in enumerate(self.train_img_loader):
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\dataloader.py", line 530, in __next__
    data = self._next_data()
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\data.py", line 98, in __getitem__
    coding = utils.multi_hot_coding(self.annos[imgId], self.pa.patch_size,
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\utils.py", line 328, in multi_hot_coding
    aoi_ratio = calc_overlap_ratio(bbox, patch_size, patch_num)
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\utils.py", line 311, in calc_overlap_ratio
    aoi_ratio[0, y, x] = max((aoi_brx - aoi_tlx), 0) * max((aoi_bry - aoi_tly), 0) / float(patch_area)
IndexError: index 20 is out of bounds for axis 1 with size 20

quangdaist01 commented 2 years ago

Here is a screenshort of what was actually running untill the error

Doch88 commented 2 years ago

The problem is solved by changing the num_worker from 16 to 0

Are you running it on docker?

ouyangzhibo commented 2 years ago

Thank you for your suggestions. The problem is solved by changing the num_worker from 16 to 0 (Maybe because I run on cpu instead of gpu?). The code continued to run and then stumbled on another error. Is this error related to the scaling problem?

cat_name:  cup
img_name:  000000546494.jpg
Traceback (most recent call last):
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\train.py", line 60, in <module>
    trainer.train()
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\trainer.py", line 89, in train
    for i_batch, batch in enumerate(self.train_img_loader):
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\dataloader.py", line 530, in __next__
    data = self._next_data()
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\data.py", line 98, in __getitem__
    coding = utils.multi_hot_coding(self.annos[imgId], self.pa.patch_size,
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\utils.py", line 328, in multi_hot_coding
    aoi_ratio = calc_overlap_ratio(bbox, patch_size, patch_num)
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\utils.py", line 311, in calc_overlap_ratio
    aoi_ratio[0, y, x] = max((aoi_brx - aoi_tlx), 0) * max((aoi_bry - aoi_tly), 0) / float(patch_area)
IndexError: index 20 is out of bounds for axis 1 with size 20

Very likely. I am not sure if you are running the code on other dataset. The code assumes the input image size is 320x512 and the action space is 20x32. Hence, you would need to rescale all coordinates including the fixations and bounding boxes to 320x512 otherwise it would probably incur the out of range error. If you are using the COCO-Seach18 dataset, you can simply download the rescaled json files via this link.

quangdaist01 commented 2 years ago

The problem is solved by changing the num_worker from 16 to 0

Are you running it on docker?

I am trying to train the model on my local computer (and I don't have CUDA set up yet)

quangdaist01 commented 2 years ago

Thank you for your suggestions. The problem is solved by changing the num_worker from 16 to 0 (Maybe because I run on cpu instead of gpu?). The code continued to run and then stumbled on another error. Is this error related to the scaling problem?

cat_name:  cup
img_name:  000000546494.jpg
Traceback (most recent call last):
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\train.py", line 60, in <module>
    trainer.train()
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\trainer.py", line 89, in train
    for i_batch, batch in enumerate(self.train_img_loader):
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\dataloader.py", line 530, in __next__
    data = self._next_data()
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\dataloader.py", line 570, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\MSI I5\PycharmProjects\Scanpath\lib\site-packages\torch\utils\data\_utils\fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\data.py", line 98, in __getitem__
    coding = utils.multi_hot_coding(self.annos[imgId], self.pa.patch_size,
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\utils.py", line 328, in multi_hot_coding
    aoi_ratio = calc_overlap_ratio(bbox, patch_size, patch_num)
  File "C:\Users\MSI I5\PycharmProjects\Scanpath_Prediction\irl_dcb\utils.py", line 311, in calc_overlap_ratio
    aoi_ratio[0, y, x] = max((aoi_brx - aoi_tlx), 0) * max((aoi_bry - aoi_tly), 0) / float(patch_area)
IndexError: index 20 is out of bounds for axis 1 with size 20

Very likely. I am not sure if you are running the code on other dataset. The code assumes the input image size is 320x512 and the action space is 20x32. Hence, you would need to rescale all coordinates including the fixations and bounding boxes to 320x512 otherwise it would probably incur the out of range error. If you are using the COCO-Seach18 dataset, you can simply download the rescaled json files via this link.

I was running on that dataset and got the above error, which is strange :(.

ouyangzhibo commented 2 years ago

Please double check if the coordinates of the input fixations and bounding boxes do not exceed 320x512. Note that the rescaled json files have different names from what is specified in train.py. After you've download the rescaled json files, you need to change the file names in the code

quangdaist01 commented 2 years ago

Thank you for you super-quick support. I'll update when the problem is fixed. Thank you! Good day and good lunch!! ^^

quangdaist01 commented 2 years ago

Hello, I have to scale the bbox coordinates in the calc_overlap_ratio() function in util.py. I am still not sure if it is correct but currently the bug has gone. I am closing the issue. Thank you very much! Have a good day!

cvlab-stonybrook / Scanpath_Prediction

KeyError: 'getstate' when running train.py #20

cvlab-stonybrook / Scanpath_Prediction

KeyError: '__getstate__' when running train.py #20

KeyError: 'getstate' when running train.py #20