duanzhiihao / RAPiD

RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images (CVPR 2020 Workshops)
http://vip.bu.edu/rapid/
Other
213 stars 63 forks source link

Validation and evaluation label format information? #4

Closed iampartho closed 3 years ago

iampartho commented 4 years ago

Hello , I have been using your model for a project of mine. Two crucial question I need to know is that during validation what should be the ground truth bounding box format. Should it be {top_x, top_y, width, height} or should be {normalised_top_x, normalised_top_y, normalised_width, normalised_height} or {center_x, center_y, width, height} or {normalised_center_x, normalised_center_y, nomalised_width, normalised_height} ? And during evaluation it seems like model returns the bounding box point as {normalised_center_x, normalised_center_y, absolute_width , absolute_height, absolute_angle, confidence_score} . Am I correct? Thanks.

duanzhiihao commented 4 years ago

Thank you for your interest in our project.

  1. The ground truth bbox format should be unnormalized center_x, center_y, width, height, angle (degree, clockwise). See https://github.com/duanzhiihao/CEPDOF_tools/blob/master/CEPDOF_sample/annotations/video_0.json for an example.

  2. The bboxes that the model returns are absolute_center_x, absolute_center_y, absolute_width , absolute_height, absolute_angle, confidence_score. Center x and y are unnormalized in the following line: https://github.com/duanzhiihao/RAPiD/blob/df9a52948c60b1a8cabb5bdae68b70dab11968e4/models/rapid.py#L199

Feel free to ask me if you have further questions.

iampartho commented 4 years ago

Thank you very much for your quick response. This clears up a lot of issues I am facing in my pipeline. I will be working on this project for next one month or so. If I have further question I will address here. I will close the issue after the end of my project.

By the way, great work by your team , loved it.

iampartho commented 4 years ago

Hello , I have another query if you could help me with. So I have been training the network on my data. First, I used a batch size of 4 , per iteration took around 15 seconds but the data loading was pretty slow. And most of the time I got an error of "Broken pipe" after like 200+ iteration or so. So I changed the batch size to one and after that I do not get the broken pipe error and data loading time has also reduced a whole lot. But per iteration take about 24 seconds. I have a training set of 15k images, if per iteration takes this amount of time then it would take like 4+ days to complete 15K iteration (or one epoch) . Could you please suggest me what I can do or what parameter I can change here to fast my training. PS: I am using Titan X GPU . Batch size 8 gives me a "cuda out of memory error"

duanzhiihao commented 4 years ago
  1. The first thing to notice is that each iteration contains 128 images, not a single image; https://github.com/duanzhiihao/RAPiD/blob/df9a52948c60b1a8cabb5bdae68b70dab11968e4/train.py#L52
  2. "Broken pipe" errors are mostly caused by other errors. Please try to set num_cpu = 0 in train.py to see the original error. https://github.com/duanzhiihao/RAPiD/blob/df9a52948c60b1a8cabb5bdae68b70dab11968e4/train.py#L51

Please tell me if you have other questions.

iampartho commented 4 years ago

But at the 166 line of train.py where you create the dataloader variable the batch_size parameter is set to the the "batch_size" variable which is 1 by default (set at line 50 of train.py) . Then we use the dataloader to iterate though the per iteration. Could you please explain how each iteration is containing 128 images? It would be a big help, thanks.

duanzhiihao commented 4 years ago

Sure. If batch_size=1, we have subdivision = 128 // batch_size = 128. At each iteration, we are getting subdivision number of image-label pairs, see line 244. https://github.com/duanzhiihao/RAPiD/blob/df9a52948c60b1a8cabb5bdae68b70dab11968e4/train.py#L244

batch_size is encouraged to set as large as possible. I set it to 1 by default because that is convenient for debugging.

iampartho commented 4 years ago

Ohh , I see you did gradient accumulation sort of stuff. That's great , thank you. But shouldn't you divide the loss by the subdivision number before calling loss.backward() [I am saying based on this. ] Have you tried it? Could you please share me your thoughts about it?

duanzhiihao commented 4 years ago

Sorry for the late reply! You have a great point. I divide the learning rate (instead of the loss) by the subdivision number and the batch_size. https://github.com/duanzhiihao/RAPiD/blob/df9a52948c60b1a8cabb5bdae68b70dab11968e4/train.py#L66

Alternatively, one could divide the loss or the gradient by the subdivision and batch_size number. They are equivalent.