LocalJoost / YoloHolo

HoloLens app using a Yolo model to identify objects in space
MIT License
36 stars 5 forks source link

Replace yolo model #9

Open TongyuanLiu opened 5 months ago

TongyuanLiu commented 5 months ago

Hi, I'm thoroughly impressed with the work you've accomplished. It's been incredibly helpful for my school's design project.

We want to use HoloLens to recognize basketball, so we decided to change to yolo model. I am wondering if you convert to .pt file to the .onnx file or downloaded the .onnx file from other github repository. When we tried to replace the yolo model, we had an error that seems to because of the output size doesn't match. Your yolo model (256x320) outputs float32[1,5040,85], but ours outputs float32[Concatoutput_dim_0,7].

We use the following code to convert .pt to .onnx. click on: https://github.com/PINTO0309/PINTO_model_zoo/tree/main/307_YOLOv7 then from demo folder, we found YOLOv7 with ONNX in Python: https://github.com/ibaiGorordo/ONNX-YOLOv7-Object-Detection then we found Original YOLOv7 model: https://github.com/WongKinYiu/yolov7 from this repository we found "Pytorch to ONNX with NMS (and inference)" colab notebook: https://colab.research.google.com/github/WongKinYiu/yolov7/blob/main/tools/YOLOv7onnx.ipynb We use this notebook to convert the yolov7 .pt file to .onnx with input size 256x320, but the output is different from yours as metioned above.

Thanks in advance.

LocalJoost commented 5 months ago

Hi, I am glad my little sample worked for you. I did not make the model myself, I downloaded it. If you want to use a different model, I would suggest reading https://localjoost.github.io/HoloLens-AI-training-a-YoloV8-model-locally-on-custom-pictures-to-recognize-objects-in-3D-space/ and using this branch. https://github.com/LocalJoost/YoloHolo/tree/airplanedetection

TongyuanLiu commented 5 months ago

Thank you. That's very helpful.

Sheltim233 commented 5 months ago

I'm also involved in this project. Following your guidance, we've successfully trained and integrated our model into HoloLens 2. However, we now aim to recognize two objects simultaneously, whereas the HoloLens 2 currently supports recognition of only one object at a time. Could you advise on any necessary modifications to the Unity C# files to facilitate this? Your suggestions would be invaluable, and I sincerely appreciate your time.

We changed V8AirplaneTranslator, in which we modified detectableObjects to two new labels, but it can only detect one object and never detect another.

LocalJoost commented 4 months ago

V8AirplaneTranslator is only a very simple class that translates the detected object's class id into a text, as you probably have noticed. I am not quite sure what your problem is. Do you want it to be able to recognize different objects, like I showed in https://localjoost.github.io/HoloLens-AI-using-Yolo-ONNX-models-to-localize-objects-in-3D-space/, or do you want it to recognize different objects at the same time? I think the code should support that.

Sheltim233 commented 4 months ago

Thank you for your response. Our objective is to simultaneously detect two distinct objects. Despite modifying the detectableObjects, we're encountering a problem where only one type of object is recognized. Specifically, we can detect multiple instances of object A, but object B remains undetected on the HoloLens 2. Interestingly, when we evaluate the .pt file independently, both objects A and B are correctly identified. We're keen on understanding your insights or recommendations on addressing this challenge.

LocalJoost commented 4 months ago

@Sheltim233 Sorry for the slow response. I have spent quite some time debugging this and I have to ascertain my code does work indeed, and recognizes multiple objects in one go. But apparently, that only works when you set the probability filter lower than my default settings. When I take my default 0.65, I almost always get only one object: image However, if I lower the Minimum Probability to 0.3, I get multiple objects per picture. However, you also get a lot bigger false positive rate. So that's the thing you need to play with, I guess

TongyuanLiu commented 4 months ago

Thanks for your reply. May I ask what's the difference between the confidence and classProbabilities? Screenshot 2024-04-18 234915 I noticed that your airplan yolo model has output tensor of [1, 5, 1680]. 4 for the bounding box, 1 for the confidence, and the for loop won't be entered since i started from 5 (if the model can only detect one class). Here is why I got confused because it seems the yolov8 model uses the first four values for bounding box and the reset of values for probabilities of each class. (https://github.com/ultralytics/ultralytics/issues/751 and https://github.com/ultralytics/ultralytics/issues/8421) image

Our trained model can detect 2 classes (ball, hoop). The output tensor is [1, 6(4 for bounding box, 2 for class probabilities), 1680].However, when we changed the V8AirplaneTranslator, it can still only detect the ball (I guess because it is index 0). I'm not sure if this is related to the different format of yolov8 output tensor or the overlap threshold.

LocalJoost commented 4 months ago

@TongyuanLiu:

May I ask what's the difference between the confidence and classProbabilities?

There is none. At least not in my code. It's a different word for the same thing.

As to you remark about V8: I am not quite sure what you mean. I found out the difference between the YoloV7 and the YoloV8 created by the Ultralytics tools to be thus. https://localjoost.github.io/HoloLens-AI-training-a-YoloV8-model-locally-on-custom-pictures-to-recognize-objects-in-3D-space/#v7-versus-v8-. I'd suggest you lower the minimum probability (which equals minimal confidence) and see what happens

TongyuanLiu commented 4 months ago

Thank you for the quick reply. I edited my previous comment. I am confused about if the yolo model can only detect the one class, it will assign the probability to the confidence variable and the for loop won't be entered. If the yolo model can detect two classes, the for loop will be entered one time, which makes the maxIndex still be 0 because there is only 1 probability in the classProbabilities array. Suppose my classes are defined as 0:ball, 1: hoop, and I want to detect a hoop image. Will the classProbabilities.IndexOf(classProbabilities.Max()) will gives 0, making maxIndex 0, which will make the MostLiklyObject be ball? image

I also met another problem that if I give a picture that only have a hoop, the program cannot label it. (It can only detect balls).

Thank you so much for your help in advance.

TongyuanLiu commented 4 months ago

I think I might know how to fix this issue. The reason why the program cannot show class1 is because the confidence is always be the probability of the class0. Therefore, when I want to let it recognize class1, the ProcessV8Item function will never add it to boxesMeetingConfidenceLevel. Therefore, I changed the YoloV8Item.cs and it fixed this issue. Screenshot 2024-04-19 012157