LibrePhotos / librephotos

A self-hosted open source photo management service. This is the repository of the backend.
MIT License
6.95k stars 303 forks source link

Enable object detection and classification #54

Open derneuere opened 3 years ago

derneuere commented 3 years ago

The project uses densecap, but the original author disabled it. There is no explanation why it was disabled. We should be trying to get it to run and evaluate if it is usable or if we need a new machine learning model.

https://github.com/LibrePhotos/librephotos/blob/289f413c303bc06e04de2c8c8decb764ba86481c/api/models.py#L206-L221

We need to also need to change this function to add it to the AlbumThings:

https://github.com/LibrePhotos/librephotos/blob/289f413c303bc06e04de2c8c8decb764ba86481c/api/models.py#L496-L515

derneuere commented 3 years ago

Well I tried it, and it just crashed :/ Maybe somebody else will have more luck with it. We could also switch to a different object detection and classification framework.

derneuere commented 3 years ago

Related to #36

derneuere commented 3 years ago

Maybe we should try https://github.com/OlafenwaMoses/ImageAI

derneuere commented 3 years ago

Or Yolov4: https://github.com/AlexeyAB/darknet

derneuere commented 3 years ago

We should maybe use this. Looks like the most used framework: https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/inference_tf2_colab.ipynb

derneuere commented 3 years ago

https://github.com/facebookresearch/detectron2 with LVIS Instance Segmentation Baselines with Mask R-CNN model https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md

derneuere commented 3 years ago

@airfield20 Hey, I saw your post in the ownphotos thread. Are you still interested in implementing object detection for the project?

airfield20 commented 3 years ago

Yes, I'd like to contribute if I can. I did not know the project was active again.

derneuere commented 3 years ago

Could you write a python script that implements YOLO Object Detection? Input would be a picture or image path and output would be the list of objects as strings and their confidence value.

Also, some install instructions for YOLO would be great so that I can implement in the dockerfile 👍

airfield20 commented 3 years ago

Sure, should I submit a pull request to the dev branch with the file in the root directory? or just post it here

airfield20 commented 3 years ago

I think it will be easiest to use the YOLO classifier that's built into opencv, to ensure maximum compatibility. If we test another better performing classifier later on that may depend on specific hardware, we could add a configuration option for the user to select which system they prefer.

derneuere commented 3 years ago

Sounds like a good idea!

Pull request, but put the file in api folder 👍

I don't know much about classifiers, but could you choose a model that is able to find a lot of different object classes?

airfield20 commented 3 years ago

@derneuere image

image

testing yolov4 w/ opencv. Hows this?

Also these are the class names that can be detected: https://raw.githubusercontent.com/hhk7734/tensorflow-yolov4/master/test/dataset/coco.names

airfield20 commented 3 years ago

Just basing my implementation on this gist https://gist.github.com/YashasSamaga/e2b19a6807a13046e399f4bc3cca3a49

airfield20 commented 3 years ago

image

image

derneuere commented 3 years ago

Looks good! 👍 But I would prefer more classes. This one seems to support up to 9000: https://github.com/philipperemy/yolo-9000 Could you try if the cfg and the weight files are compatible?

derneuere commented 3 years ago

cfg is here: https://github.com/pjreddie/darknet/tree/61c9d02ec461e30d55762ec7669d6a1d3c356fb2/cfg You have to download this folder https://github.com/philipperemy/yolo-9000/tree/master/yolo9000-weights and do this: cat yolo9000-weights/x* > yolo9000-weights/yolo9000.weights # it was generated from split -b 95m yolo9000.weights

airfield20 commented 3 years ago

Just tried that and the model will not initialize using cv2.dnn_DetectionModel class

derneuere commented 3 years ago

Hmm, in https://github.com/AlexeyAB/darknet there are the following tips. Seems to be a fork:

186 MB Yolo9000 - image: darknet.exe detector test cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights Remember to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app

airfield20 commented 3 years ago

@derneuere I've managed to get yolo-9000 working using https://pypi.org/project/darknetpy/

code snippet:

from darknetpy.detector import Detector

detector = Detector('/home/aaron/Repos/yolo-9000/darknet/cfg/combine9k.data',
                    '/home/aaron/Repos/yolo-9000/darknet/cfg/yolo9000.cfg',
                    '/home/aaron/Repos/yolo-9000/yolo9000-weights/yolo9000.weights')

results = detector.detect('/home/aaron/Repos/librephotos/api/yolo/test_images/pets.png')
print(results)

the cfg file requires the data folder from https://github.com/pjreddie/darknet/tree/61c9d02ec461e30d55762ec7669d6a1d3c356fb2

I think this detector.detect function has the interface that you require.

airfield20 commented 3 years ago

@derneuere do you want this wrapped in a function in the API folder or is this good enough?

derneuere commented 3 years ago

Yes, that would be great 👍 I also need some install instructions. Do I only have to download the files and add darknetpy to the requirements or do I have to do more?

airfield20 commented 3 years ago

darknetpy relies on clang to be installed as well. which can be installed via apt. I will write more in depth instructions and post them here after I create the PR.

derneuere commented 3 years ago

I got it to work and opened up a pull request: https://github.com/LibrePhotos/librephotos/pull/142

But I have a memory issue see here: https://github.com/danielgatis/darknetpy/issues/31

derneuere commented 3 years ago

We should use MobileNetV3 to implement object detection: https://pytorch.org/vision/stable/models.html

import torch
import cv2
from torchvision import transforms, models
import torchvision

mobilenet_v3 = models.mobilenet_v3(pretrained=True)
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
mobilenet_v3(normalize(images))
// The images should be resized to 224x224 because that's the size torchvision likes.
image=torch.tensor(cv2.resize(cv2.imread("individualImage.png"),(224,224))/255.0).to(torch.float32).permute(2,0,1).unsqueeze(0)(edited)

out=mobilenet_v3(normalize(image))
// Textfile with the classes: https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a
imagenetidx[int(out.argmax())]