Question and recommendation: YOLOv4-tiny transfer learning for persons indoor detection

KyryloAntoshyn commented 4 years ago

Hi, @AlexeyAB, I have some questions about my task: persons detection on Jetson Nano. For this task I will use the camera that will be ~3 meters above the floor and my system needs to detect persons (after it I will use this data inside another system). System will be deployed in the mall (indoor). I've read the documentation about how to train already pre-trained model (perform transfer learning) to detect my custom objects and have some questions:

YOLOv4-tiny COCO has been already trained. I know that COCO dataset contains Person class. Will the quality of persons detection be better if I make transfer learning of this pre-trained model on another dataset with people, for example, Caltech Pedestrian Dataset, or it is better to use this model as it is?
If the answer of the first question is "Yes", which datasets could you recommend for persons detection that can be suitable for my task (persons detection in the mall)? I found Google Open Images dataset also.
Is it better to use ready-made datasets for training or I need to make pictures at the location where my system will work and annotate them myself (maybe use pseudo-labeling)?

Thank you in advance!

AlexeyAB commented 4 years ago

Usually better to train on your own dataset, by using pre-trained MSCOCO weights
You can try to combine MSCOCO+BDD+OpenImages (and keep only person-class) or try to find something more suitable there: https://academictorrents.com/browse.php or there https://www.kaggle.com/datasets?search=market
Its better to collect your own 2000 - 10 000 labeled images from the same cameras where it will be used

KyryloAntoshyn commented 4 years ago

Thank you so much for your help!

stephanecharette commented 4 years ago

@KyryloAntoshyn you'll probably need to go with AlexeyAB's 3rd option. See this example I did for a customer earlier this year: https://www.ccoderun.ca/programming/ml/people_counter.html

Will your camera be pointing out, or 90 degrees down? I found no data set with cameras pointing 90 degrees down. It of course will always be better when trained with images captured at the actual location. Remember that summer and winter will result in different images due to hats, coats, etc, so be ready to re-train in 6 months with a bunch of new pictures to top up the neural network.

And remember to include pictures without people in them! "negative" images. So the network doesn't start finding faces in the swirl patterns on the floor, etc.

KyryloAntoshyn commented 4 years ago

@stephanecharette thanks for your recommendation! The idea is to find person's coordinate in a confined space. The system must work outdoors too so that we will setup cameras diagonally to the floor. I want to compare 2 approaches: using bottom middle bounding box point (object detection) or middle point between two legs (pose estimation), from the camera perspective in relation to the floor. This will be used in another system.

AlexeyAB / darknet

Question and recommendation: YOLOv4-tiny transfer learning for persons indoor detection #6639