Computer vision tools survey

florian-grond commented 3 years ago

Compile all the computer vision tools that we will use and summarize what data they output Suggest a data structure that represents "the image" as data containing the output of all the tools used

florian-grond commented 3 years ago

@rohanakut Rohan, I'm assigning this to you, happy to discuss as we go along

rohanakut commented 3 years ago

Hello Florian, Based on my knowledge I have found two tools that could be helpful to us.

Object Detection Task ImageAI Advantages:

Free
Trained to classify more than 1000 different objects
Could be retrained on specific objects
Gives coordinates of the detected images

Output The output will be an array of dictionaries that will give the location of the object, the confidence score and the detection.

Human Sentiment Analysis DeepFace

Output Output is in JSON format The output will contain the following information: age, sex, emotion(7 different emotions), racial information)

According to my understanding, these are the two primary tasks in image feature extraction. Another important functionality that we had discussed was colour detection for the selected object. I was not able to find any specialised architecture for that but I think I can just write a basic python script that can detect the dominant colour in the given object and that should work out.

As for the type of output type, I think we should include all this information in JSON or XML format. I believe here the main issue would be how we design the API and if it can handle JSON. If it can then I can definitely append all the information from the above-mentioned architectures in a JSON format. I don't have any experience with JSON formatting but on a cursory glance, it looks manageable.

florian-grond commented 3 years ago

Hello Rohan, This is a great start, I have been discussing with Jeff about the file format that will hold all the information about the image, that we want to render. This will be equally useful for the haptics side of things.

We should have a consolidated data structure / file format that will contain all the AI/ML output. Jeff suggested JSON, it should not be too complicated.
You can start putting this together and once you have something going let's discuss this also in SLACK to get relevant input from everyone.

rohanakut commented 3 years ago

Machine Learning tools for color prediction https://github.com/walton-wang929/Color_Recognition - this repo is ideal for our use case but it does not work https://github.com/ahmetozlu/color_recognition - This repo is also good but we might have to segment the objects and then pass individual objects. As we go further passing the entire image or segmenting the object might seem feasible. hence I am considering both the repos for now.

florian-grond commented 3 years ago

great Rohan, will be nice to add colour as an additional attribute to recognized objects!

rohanakut commented 3 years ago

Texture Analysis using ML https://github.com/henzler/neuraltexture/tree/master/datasets - this repo might be useful for texture detection, like grass or different types of wood. We might have to test this repo to check its effectiveness but the repo looks promising. This repo does not work. Do not use

florian-grond commented 3 years ago

great!

rohanakut commented 3 years ago

Determining whether its day or night in image: This repo does a similar thing. It does not use any ML but just uses image processing for determining day or night. I think this approach could give us better results than an ML model. Could also possibly be used to determine if there is a sunrise or sunset so that might give us additional information

florian-grond commented 3 years ago

yes great, I'm sure you'll find more of such image classifiers. At this stage, it will be important to just start with the most relevant and to add more and more features later. Important is to have a pipeline that returns the .json file and we can add more to this as the project progresses.

rohanakut commented 3 years ago

Celebrity Recognition https://github.com/Srikeshram/Celebrity-Face-Recognition - working https://github.com/D2KLab/FaceRec - https://github.com/BenjaminRCho/Celebrity-Face-Recognition

https://github.com/EmnamoR/Face-recognition-Tensorflow-object-detection-api - maybe working https://github.com/shobhit9618/celeb_recognition - working

The above-mentioned models are not professionally developed by experts. Most of them look like side projects. However, these models do look impressive and could be improved if we gather strong data

rohanakut commented 3 years ago

DeepStack - https://github.com/johnolafenwa/DeepStack This repo looks a good solution if we plan to run ML on mobile devices. https://github.com/robmarkcole/HASS-Deepstack-scene is one of the sub repo in deep stack which could possibly be used for scene detection

rohanakut commented 3 years ago

Scene Recognition https://github.com/GKalliatakis/Keras-VGG16-places365 - this repo works https://github.com/vpulab/Semantic-Aware-Scene-Recognition - this repo has some issues

rohanakut commented 3 years ago

Deep Eye - https://github.com/khaixcore/deep_eye This repo just detects humans and pets. This might be useful if we are dealing with just these categories of images

rohanakut commented 3 years ago

Image segmentation: https://colab.research.google.com/github/usuyama/pytorch-unet/blob/master/pytorch_unet_resnet18_colab.ipynb - does not detect object but segments it properly https://colab.research.google.com/github/spmallick/learnopencv/blob/master/PyTorch-Segmentation-torchvision/intro-seg.ipynb#scrollTo=jebN_lm9eQ1W - semantic segmentation using pytorch https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5 - detectron2 (uses sematic segmentation and detection)

https://colab.research.google.com/github/tensorflow/models/blob/master/research/deeplab/deeplab_demo.ipynb - DeepLabV3 - gives good background and foreground estimation

jeffbl commented 3 years ago

@gp1702 assigning to you as well, to see if this is still relevant, or if there are other resources that should be added. If this is being tracked elsewhere (this could also be a wiki entry here in github, since it is more of an ongoing research/resource list than an issue at this point).

gp1702 commented 2 years ago

I think this should be closed for now.

Shared-Reality-Lab / IMAGE-server

Computer vision tools survey #3