Open sanic-the-hedgefond opened 4 years ago
Have you looked into https://github.com/facebookresearch/detectron2? It's based on Mask-RCNN but it's made by Facebook AI research (FAIR). I just used their Fairseq library for language modeling and it was a pleasure. A friend of mine strongly recommended detectron to me, maybe we should try it out?
As a matter of fact, I could just try it out myself and report on it. Meanwhile, you could try the Yolov3 approach
I plan to train yolo4 (just released couple of weeks ago: https://arxiv.org/abs/2004.10934) for the whole YCB dataset. Therefore I wrote a script to convert the YCB bbox labels to yolo format.
Next step is organizing the ycb data and get some training done on cvpc8... hopefully.
Found this implementation with keras: https://github.com/qqwweee/keras-yolo3
Got it run on a selfmade video with a banana in it and worked quite well with the pretrained model.
Next steps would be: 1) Make an interface to input image/video and output bbox of detected bananas 2) Now the model is trained for 80 object classes => Finetune it for only bananas with additional training images. That should increase the performance. 3) Maybe also try to finetune the tiny model (which is not trained on bananas yet)