AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

question about yolov3 tiny occlusion track? #2553

Open derekwong66 opened 5 years ago

derekwong66 commented 5 years ago

Hi @AlexeyAB ,

Thanks for ur contribution. Can you explain more about how to train tiny-occlusion track cfg? which pre-trained weight we can use and crnn ?

thanks in advance.

AlexeyAB commented 5 years ago

@derekwong66 Hi,

It is just an experimental detector (it isn't well tested yet)

  1. that should have higher accuracy for detection on a video stream

  2. should solve blinking issue, and 2 related problems: solve track_id migration from one object to another and solve false object counting on video, when detection disaper and apear again with new track_id (if you use yolo_console_dll.cpp (./uselib or yolo_console_dll.exe for counting objects)

  3. and can solve occlusions issue (for two objects with different class_id, or object & background) for at least 20 frames time_steps=20 in the cfg-file.


(I have not tested yet it with random=1 in cfg-file)

You should train it on sequential frames from one or several videos:

The only conditions - the frames from the video must go sequentially in the train.txt file. And you should validate results on a separate Validation dataset, for example, divide your dataset into 2:

  1. train.txt - first 80% of frames (80% from video1 + 80% from video 2, if you use frames from 2 videos)
  2. valid.txt - last 20% of frames (20% from video1 + 20% from video 2, if you use frames from 2 videos)

Idea in simple words:


Example of state of art super-resolution network: https://arxiv.org/abs/1704.02738 super_resolution

Example of object Detection and Tracking:

det_n_track


Your training process can looks like this:

chart_full_occlusion

diennv commented 5 years ago

Dear AlexeyAB,

I also have problem with track_id when using Yolov3 for counting objects on video.

I used yolov3.cfg for my own dataset. I saw that yolov3-tiny_cocclusion_track.cfg haves some modification compared with yolov3-tiny.cfg. In example, [crnn] layer is added... I have a question. For yolov3.cfg, could i do same way to solve track_id issue ?

AlexeyAB commented 5 years ago

@diennv Hi,

For yolov3.cfg, could i do same way to solve track_id issue ?

Yes, you can. Just it may require very much GPU-RAM for Yolov3 + CRNN, that can be trained only on highend GPUs.

Also later I will add conv-LSTM layer, that can work with higher accuracy.

PythonImageDeveloper commented 5 years ago

@AlexeyAB Hi, I have one 1080 GPU, Is this device enough for training the yolov3-tiny_cocclusion_track? yolo3-occlusion-track too? Is the yolov3-tiny_cocclusion_track algorithm better than existing algorithms in OpenCV for training such as KFC, MOSSE,....?

AlexeyAB commented 5 years ago

@zeynali Hi,

GTX 1080 is enought for training yolov3-tiny_cocclusion_track.cfg But if you want to create your custom yolo3-occlusion-track.cfg then you should use: Quadro RTX 8000 (48 GB GPU-RAM), Quadro GV100 (32 GB GPU-RAM), Nvidia TITAN V - CEO Edition (32 GB GPU-RAM), Tesla V100 32GB-Edition, DGX-2 with V100 32GB-edition

yolov3-tiny_cocclusion_track.cfg isn't well tested yet, and will be modified to use conv-LSTM instead of conv-RNN later. So I can't compare it with another algorithms.

kamarulhairi commented 5 years ago

Hi,

for the 3rd step which is to train the detector and tracker, where do i get the data/occlusion.data?

Thank You

jacklin602 commented 5 years ago

Hi @AlexeyAB ,

Can you explain more about the rules of labeling bboxes for occlusion tracking? If the object is totally invisible (occluded by another type of object or background), should i estimate and mark the object no matter how long it is occluded? If the object is partially occluded, should i only mark the visible area of the object or the estimated total extent of the object?

Thank you

AlexeyAB commented 5 years ago

@jacklin602 Hi,

If the object is totally invisible (occluded by another type of object or background), should i estimate and mark the object no matter how long it is occluded?

Yes, you should mark totally invisible objects in Training and Validation datasets.


If the object is partially occluded, should i only mark the visible area of the object or the estimated total extent of the object?

You can do as you want that it will be detected. Usually I mark the estimated total extent of the object.

Also in several days I will commit new version of yolov3-tiny_occlusion_track.cfg with convolutional-LSTM, that is works much better for occlusions.

alexanderfrey commented 5 years ago

@AlexeyAB Hi, I'm also very much interested into your LSTM implementation to compare it to Deep sort tracker. Best Alexander

Tangotrax commented 5 years ago

@AlexeyAB I'm also very much interested in your LSTM implementation

AlexeyAB commented 5 years ago

@alexanderfrey @Tangotrax @diennv @derekwong66 @PythonImageDeveloper @kamarulhairi

You can try to use LSTM-models with the latest version of Darknet: https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-494148968

For example, this model: https://github.com/AlexeyAB/darknet/files/3199631/yolo_v3_tiny_pan_lstm.cfg.txt

How to train: https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-494154586

alexanderfrey commented 5 years ago

@AlexeyAB Thanks, appreciate your work very much ! I am very excited to see how it performs. Any recommendations on how many sequences it should be trained with ?

AlexeyAB commented 5 years ago

@alexanderfrey There are no exact recommendations yet. Just each sequence should have more than 200 frames.

alexanderfrey commented 5 years ago

@AlexeyAB Hi Alexey, thanks for your great work ! I trained a yolo_v3_tiny_pan_lstm like you explained in #3114 and the results look promising for the first 1000 iterations. I train on 15 video sequences with approx. 180 frames per sequences. (I know this should be higher)

Unfortunately the avg. loss becomes NAN after roughly 1200 iterations. I use the yolov3-tiny.conv.14 weights and I already set state_constrain=16 for each [conv_lstm] layer and sequential_subdivisions=8 and sgdr_cycle=10000. I train on 3 classes and adjusted the filters accordingly to 24(the ones before the yolo layers...) Anchors are default and I set batch and subdivision to 1. What else can I do to make the training run through ?

Thanks for any help

AlexeyAB commented 5 years ago

@alexanderfrey Hi,

Anchors are default and I set batch and subdivision to 1.

Did you set batch=1 subdivision=1 for training? You must use

batch=16
subdivisions=4

you can increase batch, but you shouldn't decrease batch.

What GPU do you use?


If it doesn't help - then try to set

learning_rate=0.001
burn_in=1000
max_batches = 10000

policy=steps
steps=8000,9000
scales=.1,.1

If it doesn't help, then try to set learning_rate=0.0005 or learning_rate=0.0002

alexanderfrey commented 5 years ago

@AlexeyAB Thank you very much for the answer. I started a new run this night with batch 64 and subdivision 64. So far its running. Should steps be 2 or 4 values ? In the default config file its 4 values.... Btw. is it possible to get the track_id of an object ? Thanks !