DonsetPG / narya

The Narya API allows you track soccer player from camera inputs, and evaluate them with an Expected Discounted Goal (EDG) Agent. This repository contains the implementation of the flowing paper https://arxiv.org/abs/2101.05388. We also make available all of our pretrained agents, and the datasets we used as well.
MIT License
166 stars 48 forks source link

Training KeypointDetectorModel on Google Colab #11

Closed karlosos closed 3 years ago

karlosos commented 3 years ago

I have a problem with training KeypointDetectorModel on Google Colab. A mysterious thing happens and the cell ends with ^C output.

... [ommited logs for readability]
Total params: 13,945,158
Trainable params: 13,855,558
Non-trainable params: 89,600
__________________________________________________________________________________________________
----------
Building dataset
----------
----------
Launching the training
----------
Epoch 1/100
2021-02-11 19:22:11.500514: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
^C

I have no idea where the problem is as there is no logs. Here is colab notebook with minimal code to reproduce this error.

DonsetPG commented 3 years ago

Hey,

I was able to reproduce your error. This comes from the huge batch size you are using (32). (technically it is my fault, I will reduce the default batch size in this script).

I was able to train a model with a batch size of 4 and 8. It also showed that I made a mistake while zipping the dataset, since a .DS_Store file is in there... Remove it and the training will be fine.

I'll care of both these modifications this week end. Let me know if you are now able to run the script.

karlosos commented 3 years ago

Yeah. Thank you! Removing .DS_Store files with

!rm ./data_keypoints/data_keypoints/.DS_Store
!rm ./data_keypoints/data_keypoints/test/.DS_Store
!rm ./data_keypoints/data_keypoints/test/Annotations/.DS_Store
!rm ./data_keypoints/data_keypoints/train/.DS_Store
!rm ./data_keypoints/data_keypoints/train/Annotations/.DS_Store
!rm ./data_keypoints/data_keypoints/train/JPEGImages/.DS_Store

and changing batch_size fixed the problem! 🎉

DonsetPG commented 3 years ago

Solved in the 2 following commits:

I hope this helps.