Support for Yolov3 - Githubissues

kodonnell commented 3 years ago

Is your feature request related to a problem? Please describe. Feature request to support the YoloV3 architecture and training.

Describe the solution you'd like

The ability to train a yolov3 model with custom data.
And an example of how deploy it to the K210 to recognize image/video.
Performance (speed/accuracy) results reported and compared with yolov2 or others.

Describe alternatives you've considered YoloV2 is currently supported. Yolov4/v5 were discounted in #38 (though it was indicated that Yolov3 should be supported).

Additional context

https://github.com/zhen8838/K210_Yolo_framework may be a good place to start, though I haven't had the model converge well with that.

AIWintermuteAI commented 3 years ago

Hello! Yes, the move to YOLOv3 was long overdue.

may be a good place to start, though I haven't had the model converge well with that.

Doesn't sound very encouraging, but I will try this repo first on some of my datasets and see if there is improvement in mAP.

And an example of how deploy it to the K210 to recognize image/video.

Yes, about that. YOLOv3 requires different result parser than YOLOv2 - this means adding function to parse YOLOv3 to Micropython firmware, so it can be used with MaixPy. You can see that author of https://github.com/zhen8838/K210_Yolo_framework didn't share the parsing function for K210 and mentions that

If you need standard yolov3 region layer code, you can buy with me.

-_-'

kodonnell commented 3 years ago

I will try this repo first on some of my datasets and see if there is improvement in mAP.

Any luck?

YOLOv3 requires different result parser than YOLOv2 ...

I might be wrong, but IIRC where I got to on this matter was that it was largely just using the YOLOv2 parser three times (for three different scales - though only two in this case), i.e. not a major. I saw the comment about buying the code - @zhen8838 are you happy to open source it now it's a few years old?

AIWintermuteAI commented 3 years ago

Hi, there. I've tested training the model on some of my datasets these two days and it does perform better, especially on smaller objects, e.g. my mask dataset mask Not perfect, but I only trained for 10 epochs.

Next step would be to try it on K210 - you're right, the changes in parser between YOLOv2 and YOLOv3 are minor. I'll try running only first scale to see if the model can be executed and what is the inference speed. Now for porting it to aXeleRate, the original repo's code is a bit convoluted... Anyways, I only need the detection layer and loss functions from it, so I'm optimistic that I can make it work.

By the way, since I've tested the original repo on few datasets and found that it converges nicely, if it didn't converge for your dataset, then perhaps there is a problem with the dataset?

kodonnell commented 3 years ago

Ooooh, great work! Glad to hear it's better than v2 for you, so yes, keen to yes the impact on inference speed.

I'll try running only first scale to see if the model can be executed and what is the inference speed.

Ah, that might explain the comment in the above repo "NOTE: I just use kendryte yolov2 demo code to prove the validity of the model." I.e. it'll probably work out-of-box with the yolov2 code as if I remember correctly it just grabs the first model output. You might find you can just drop the model into your existing yolov2 deployment flow and it'll just work.

Now for porting it to aXeleRate, the original repo's code is a bit convoluted... Anyways, I only need the detection layer and loss functions from it, so I'm optimistic that I can make it work.

Let me know if you need someone else to try it out!

By the way, since I've tested the original repo on few datasets and found that it converges nicely, if it didn't converge for your dataset, then perhaps there is a problem with the dataset?

Possibly, though I've trained it with other object detection libraries and it was fine. So it's more likely I just used the code wrong or something. I do recall it took many attempts at getting any non-NaN anchors, so it might be something weird was going on there, and my resulting anchors were so messed up that they resulted in poor results.

AIWintermuteAI commented 3 years ago

Well, the model (mobilenetv1 0.5 and 0.75 alpha) can be converted to .tflite and .kmodel, but the size of the models basically prohibits from using them with Micropython, pruning doesn't seem to affect model size in a meaningful way. Running the models with C should be fine though. I don't really have time for working on it actively at the moment, so possibly in the future. I have plans for completely overhauling aXeleRate backend system - replace all the backends now with different version of EfficientNet, from the smallest one to the largest, three possible sizes fitting a) MCUs b) K210-like systems with limited memory c) SBCs, where memory is not an issue.

When I have time, I'll do backend overhaul first and then try to make sure that YOLOv3 is reasonable size with Medium-size backend (around 1.5 Mb ideally, 1.9 at maximum) to be run on K210 with Micropython.

AIWintermuteAI commented 3 years ago

Okay, so I've ported YOLOv3 loss and batch generator, together with Precision and Recall metrics from https://github.com/zhen8838/K210_Yolo_framework to aXeleRate . You can check it out in dev branch, example of training config is here https://github.com/AIWintermuteAI/aXeleRate/blob/dev/configs/raccoon_detector.json Currently the implementation has only one branch - meaning that feature maps do not get rescaled. I'll implement it fairly soon, as soon as I finish testing new loss and make sure there are no problems with loss/batch generator by themselves.

kervel commented 3 years ago

Hello,

Training fails for me on dev branch.

self.out_hw: [[7 7]] and self.grid_wh: [[0.14285714 0.14285714]] i guess grid_wh should have a row for every anchor (looking from the code)

AIWintermuteAI commented 3 years ago

Yes, the config for anchors was changed to accommodate for two branches - even if you only use one "anchors": [[[0.76120044, 0.57155991], [0.6923348, 0.88535553], [0.47163042, 0.34163313]]],

AIWintermuteAI commented 3 years ago

Okay, so as far as of now YOLOv3 status is:

Under dev branch you can use sample configs to train both single branch and double branch YOLOv3. I have only tested it with 224x224, 224x320 and 240x320 resolutions. Double branch version is cut-down version of original YOLOv3 architecture, it has no additional convolutional layers in the second branch, just simple concatenation. It still can achieve reasonable accuracy, while retaining number of parameters very close to original YOLO. However more testing needed to see how much accuracy is lost. I tested it with PASCAL VOC 20 classes and mask datasets - it performed very well (merged recall for two branches > 0.85, precision ~0.7) on mask dataset and worse on PASCAL VOC 20 classes. You can use single branch version by specifying anchors in config as [[[0.76120044, 0.57155991], [0.6923348, 0.88535553], [0.47163042, 0.34163313]]], and double branch version with [[[0.76120044, 0.57155991], [0.6923348, 0.88535553], [0.47163042, 0.34163313]], [[0.33340788, 0.70065861], [0.18124964, 0.38986752], [0.08497349, 0.1527057 ]]],
It is no currently possible to use YOLOv3 on K210 for two reasons 1) It requires a different parser than yolov2. I will get to writing it if the second problem is resolved 2) nncase converter doesn't support SHAPE operator, see https://github.com/kendryte/nncase/issues/213#issuecomment-864535710 However nncase maintainer told me that it will be added in the next version.

Once I have a bit more time, I'll do more testing and cleaning and publish current dev branch as "next" branch. If and when it will be possible to run YOLO v3 model on K210, I'll push the changes from next to master.

kervel commented 3 years ago

Hello, I'm trying it out now. Any reason map_evaluation is gone ? while it was a bit slow, it was useful to see how the model was performing during training. For now i just copied it from the master branch and integrated it back.

I assume that the shape op should not be a deal breaker, if we have a good model, and we really don't have shape op, we could still try to modify the compute graph and replace the shape ops by constants.

Thanks, Frank

AIWintermuteAI commented 3 years ago

I switched from using mAP evaluation callback to Recall and Precision metrics. There is a new Callback called MergedMetrics, which takes sum of either validation recalls or validation precisions for two branches and then divides them by two. For final evaluation, evaluate.py can be used. Do you think old mAP callback was more useful in determining quality of the model?

Yes, but ideally I'd like the process of conversion to be simple and automated... I'll do more tests and hopefully SHAPE op will be supported in the end of June/July.

kervel commented 3 years ago

Hello, when i try precision always starts very high (because the model predicts almost nothing it also doesn't predict false positives). and then it goes down (but recall goes up).

i now use the per class mAP to judge my dataset quality: i see huge differences in mAP between some class and some other class, and this helps me to know where there is work to do.

kervel commented 3 years ago

Hello, i let a mobilenet_5_0 with 2 output layers train overnight. When looking at tensorboard i tought is was going nowhere (both precision and recall didn't really show any trend). however, when i tried the model it was actually already performing much better than the best yolo2 version i had. While going from one layer to two layers map_evaluation.py broke (map is always zero). i need to figure out what's wrong. so its looking good but i'm training blind now.

AIWintermuteAI commented 3 years ago

No trend on precision/recall? That's a bit strange... I use validation recall as main metric and this is what I have on my last training session


REPORT
{'fscore': 0.525520604340701, 'precision': 0.6545852832818879, 'recall': 0.4389690170940171}
CONFIG
{
    "model": {
        "type": "Detector",
        "architecture": "MobileNet1_0",
        "input_size": 320,
        "anchors": [
            [
                [
                    0.76120044,
                    0.57155991
                ],
                [
                    0.6923348,
                    0.88535553
                ],
                [
                    0.47163042,
                    0.34163313
                ]
            ],
            [
                [
                    0.33340788,
                    0.70065861
                ],
                [
                    0.18124964,
                    0.38986752
                ],
                [
                    0.08497349,
                    0.1527057
                ]
            ]
        ],
        "labels": [
            "person",
            "bird",
            "cat",
            "cow",
            "dog",
            "horse",
            "sheep",
            "aeroplane",
            "bicycle",
            "boat",
            "bus",
            "car",
            "motorbike",
            "train",
            "bottle",
            "chair",
            "diningtable",
            "pottedplant",
            "sofa",
            "tvmonitor"
        ],
        "obj_thresh": 0.5,
        "iou_thresh": 0.5,
        "coord_scale": 1.0,
        "object_scale": 3.0,
        "no_object_scale": 1.0
    },
    "weights": {
        "full": "",
        "backend": "imagenet"
    },
    "train": {
        "actual_epoch": 100,
        "train_image_folder": "/home/ubuntu/datasets/pascal_20_detection/imgs",
        "train_annot_folder": "/home/ubuntu/datasets/pascal_20_detection/anns",
        "train_times": 1,
        "valid_image_folder": "/home/ubuntu/datasets/pascal_20_detection/imgs_validation",
        "valid_annot_folder": "/home/ubuntu/datasets/pascal_20_detection/anns_validation",
        "valid_times": 1,
        "valid_metric": "recall",
        "batch_size": 32,
        "learning_rate": 0.001,
        "saved_folder": "pascal",
        "first_trainable_layer": "",
        "augmentation": true,
        "is_only_detect": false
    },
    "converter": {
        "type": [
            "tflite"
        ]
    }
}

and inference results are fairly good for certain classes - such as car/person/chair and hit and miss for others (cat/dog/sheep). I really should train the model from original repo on PASCAL VOC for 100 epochs with same resolution and use it as baseline for determining how much accuracy was lost from slashing the additional Conv2D layers in second branch...

kodonnell commented 3 years ago

Good work!

Double branch version is cut-down version of original YOLOv3 architecture, it has no additional convolutional layers in the second branch, just simple concatenation.

Out of interest, why was this - performance? Model size? I'm interested in running them with C, so it'd be great to have both options available. I'd be interested in seeing the accuracy decrease too.

nncase converter doesn't support SHAPE operator, see

Not optimal, but we could probably get rid of the dependency on runtime tf.shape since the shapes will never change (for a given input image size). Though yes, if nncase will have it fixed in the short term, great = )

Out of interest, it sounded like they were converting https://github.com/AIWintermuteAI/aXeleRate/issues/39#issuecomment-819393678, but not anymore?

Felix-fz commented 3 years ago

Hi，

Training fails for me on dev branch. I do not know how to kill it. Maybe you know？ Please give me some advice in your free time. Thank you!

AIWintermuteAI commented 3 years ago

@CopleM hi there! I just checked, there was a little problem with raccoon config file, but it is not related to your issue. However I fixed that problem as well, make sure to do git pull Perhaps you didn't update pip package after pulling new code or switching to dev branch?

E.g, for development version, I recommend a) clone github repo b) checkout dev branch c) create new conda environment specifically for dev branch and activate it d) do pip install -e . https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-e If you have -e argument, pip will install a project in editable mode (i.e. setuptools “develop mode”) from a local project path - as opposed to without this flag, where pip will copy the files to install directory.

AIWintermuteAI commented 2 years ago

@kodonnell @kervel I created a PR to MaixPy here https://github.com/sipeed/MaixPy/pull/451 Adding support for YOLOv3. The speed is quite a bit slower than YOLOv2, because the amount of bounding boxes that are being processed every frame. Perhaps that can be optimized further. However for now, once they accept the PR, I'll mark the feature request closed.

AIWintermuteAI commented 2 years ago

It is now merged to master branch. I'm updating the examples and notebooks for this version. If there are any issues or questions with YOLOv3 apart from described in Release notes, feel free to create a new issue.

AIWintermuteAI / aXeleRate

Support for Yolov3 #39