Why reset_state and get_data are not attributes of self.ds in posedataset.py?

soveidadelgarmi commented 5 years ago

Hi, There was an error called OutOfRangeError during training. the error message is as follows [2019-07-15 05:18:54,340] [pose_dataset] [ERROR] err type2, err=Cannot feed value of shape (5, 46, 46, 11) for Tensor 'inputs/Placeholder:0', which has shape '(5, 368, 368, 3)', placeholders=[<tf.Tensor 'inputs/Placeholder:0' shape=(5, 368, 368, 3) dtype=float32>, <tf.Tensor 'inputs/Placeholder_1:0' shape=(5, 46, 46, 11) dtype=float32>, <tf.Tensor 'inputs/Placeholder_2:0' shape=(5, 46, 46, 22) dtype=float32>]

I checked the attributes of self.ds in Line 438 in posedataset.py but I received these dict_keys(['ds', '_size', 'nr_proc', 'nr_prefetch', 'queue', 'procs'])

why reset_state and get_data aren't in it??

I need to train on my own dataset. Can someone help me?? thank you

YangZeyu95 commented 5 years ago

it seems that the dimensions of your data and placeholder are mismatched. You could modify line476 in pose_dataset.py and run it to test your dataloader and check if the dimension is correct.

soveidadelgarmi commented 5 years ago

Thank you for your response. it's solved . now my code is running on google colab and its progresses until 43%...after that it doesn't progress but I don't get any errors...Is it possible because of the lack of colab memory? [2019-07-16 02:29:53,248] [train] [INFO] start training... I0716 02:29:53.248205 139761707845504 :181] start training... W0716 02:29:53.252761 139761707845504 deprecation_wrapper.py:119] From /content/drive/My Drive/Mythesiscode/pose_dataset.py:487: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

0%| | 0/21 [00:00<?, ?it/s]coord: dict_keys(['_clean_stop_exception_types', '_lock', '_stop_event', '_exc_info_to_raise', '_joined', '_registered_threads']) 5%|▍ | 1/21 [00:11<03:53, 11.66s/it] 10%|▉ | 2/21 [00:15<02:58, 9.38s/it] 19%|█▉ | 4/21 [00:17<01:26, 5.08s/it] [2019-07-16 02:30:20,601] [train] [INFO] echos=0.238095, setp=5, total_loss=2900.365723, lr=0.000100 I0716 02:30:20.601175 139761707845504 :194] echos=0.238095, setp=5, total_loss=2900.365723, lr=0.000100 29%|██▊ | 6/21 [00:28<01:11, 4.77s/it] 33%|███▎ | 7/21 [00:31<01:00, 4.30s/it] 38%|███▊ | 8/21 [00:36<00:57, 4.45s/it] 43%|████▎ | 9/21 [00:41<00:54, 4.56s/it] [2019-07-16 02:30:49,738] [train] [INFO] echos=0.476190, setp=10, total_loss=1322.226562, lr=0.000100 I0716 02:30:49.738911 139761707845504 :194] echos=0.476190, setp=10, total_loss=1322.226562, lr=0.000100

0it [00:00, ?it/s]

soveidadelgarmi commented 5 years ago

sorry, I change the batch size to 20, and epoch is also 5,... now I get this result 0%| | 0/5 [00:00<?, ?it/s]coord: dict_keys(['_clean_stop_exception_types', '_lock', '_stop_event', '_exc_info_to_raise', '_joined', '_registered_threads']) 80%|████████ | 4/5 [00:46<00:12, 12.38s/it][2019-07-16 10:30:06,625] [train] [INFO] echos=1.000000, setp=5, total_loss=2706.424561, lr=0.000050 I0716 10:30:06.625352 140120312465280 :194] echos=1.000000, setp=5, total_loss=2706.424561, lr=0.000050 100%|██████████| 5/5 [01:28<00:00, 21.39s/it] 80%|████████ | 4/5 [00:16<00:04, 4.13s/it][2019-07-16 10:31:05,772] [train] [INFO] echos=2.000000, setp=10, total_loss=1311.981323, lr=0.000025 I0716 10:31:05.772133 140120312465280 :194] echos=2.000000, setp=10, total_loss=1311.981323, lr=0.000025

0it [00:00, ?it/s]

YangZeyu95 commented 5 years ago

That might because of lack of memory. it will do a test on test data before saving checkpoints to make sure that only the best checkpoint will be saved, all of the test data will be loaded in memory. You can just comment line173-190 but keep line186 and 187 in train.py to skip the testing phase.

YangZeyu95 commented 5 years ago

btw, I might the same problem when I use a small dataset. That because the buffer size is larger than dataset's batch number. I fix it by turn the 'buffer_size' at line 369 in pose_dataset.py to a small number.

soveidadelgarmi commented 5 years ago

Thank you for your help. my dataset is small that included 105 images in total but I don't know for this number of images, What size should I consider for 'buffer_size'? the second question is that how can I train on my own small dataset to reach an acceptable result? Thank you very much.

YangZeyu95 commented 5 years ago

1 or 2 is ok. You can train the network by a big dataset and then fine-tune it with your small dataset.

soveidadelgarmi commented 5 years ago

@YangZeyu95 thank you for your response, I got it.:)

YangZeyu95 / unofficial-implement-of-openpose

Why reset_state and get_data are not attributes of self.ds in posedataset.py? #38