Closed soveidadelgarmi closed 5 years ago
it seems that the dimensions of your data and placeholder are mismatched. You could modify line476 in pose_dataset.py and run it to test your dataloader and check if the dimension is correct.
Thank you for your response. it's solved . now my code is running on google colab and its progresses until 43%...after that it doesn't progress but I don't get any errors...Is it possible because of the lack of colab memory?
[2019-07-16 02:29:53,248] [train] [INFO] start training...
I0716 02:29:53.248205 139761707845504
0%| | 0/21 [00:00<?, ?it/s]coord: dict_keys(['_clean_stop_exception_types', '_lock', '_stop_event', '_exc_info_to_raise', '_joined', '_registered_threads'])
5%|▍ | 1/21 [00:11<03:53, 11.66s/it]
10%|▉ | 2/21 [00:15<02:58, 9.38s/it]
19%|█▉ | 4/21 [00:17<01:26, 5.08s/it]
[2019-07-16 02:30:20,601] [train] [INFO] echos=0.238095, setp=5, total_loss=2900.365723, lr=0.000100
I0716 02:30:20.601175 139761707845504
0it [00:00, ?it/s]
sorry, I change the batch size to 20, and epoch is also 5,... now I get this result
0%| | 0/5 [00:00<?, ?it/s]coord: dict_keys(['_clean_stop_exception_types', '_lock', '_stop_event', '_exc_info_to_raise', '_joined', '_registered_threads'])
80%|████████ | 4/5 [00:46<00:12, 12.38s/it][2019-07-16 10:30:06,625] [train] [INFO] echos=1.000000, setp=5, total_loss=2706.424561, lr=0.000050
I0716 10:30:06.625352 140120312465280
0it [00:00, ?it/s]
That might because of lack of memory. it will do a test on test data before saving checkpoints to make sure that only the best checkpoint will be saved, all of the test data will be loaded in memory. You can just comment line173-190 but keep line186 and 187 in train.py to skip the testing phase.
btw, I might the same problem when I use a small dataset. That because the buffer size is larger than dataset's batch number. I fix it by turn the 'buffer_size' at line 369 in pose_dataset.py to a small number.
Thank you for your help. my dataset is small that included 105 images in total but I don't know for this number of images, What size should I consider for 'buffer_size'? the second question is that how can I train on my own small dataset to reach an acceptable result? Thank you very much.
1 or 2 is ok. You can train the network by a big dataset and then fine-tune it with your small dataset.
@YangZeyu95 thank you for your response, I got it.:)
Hi, There was an error called OutOfRangeError during training. the error message is as follows [2019-07-15 05:18:54,340] [pose_dataset] [ERROR] err type2, err=Cannot feed value of shape (5, 46, 46, 11) for Tensor 'inputs/Placeholder:0', which has shape '(5, 368, 368, 3)', placeholders=[<tf.Tensor 'inputs/Placeholder:0' shape=(5, 368, 368, 3) dtype=float32>, <tf.Tensor 'inputs/Placeholder_1:0' shape=(5, 46, 46, 11) dtype=float32>, <tf.Tensor 'inputs/Placeholder_2:0' shape=(5, 46, 46, 22) dtype=float32>]
I checked the attributes of self.ds in Line 438 in posedataset.py but I received these dict_keys(['ds', '_size', 'nr_proc', 'nr_prefetch', 'queue', 'procs'])
why reset_state and get_data aren't in it??
I need to train on my own dataset. Can someone help me?? thank you