cj-mills / christianjmills

My personal blog
https://christianjmills.com/
Other
3 stars 0 forks source link

posts/unity-barracuda-inference-yolox-walkthrough/ #42

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Christian Mills - Code Walkthrough: Unity Barracuda Inference YOLOX Package

Walk through the code for the Unity Barracuda Inference YOLOX package, which extends the functionality of unity-barracuda-inference-base to perform object detection using YOLOX models.

https://christianjmills.com/posts/unity-barracuda-inference-yolox-walkthrough/

aforadil commented 1 year ago

Hi Christian, Thanks for your informative blogs and tutorials. I have a query. The demo project seems to work well with a 2D scene with a video or photo. I tried using it in-game camera for a 3D scene in Unity but couldn't figure it out.

In simple words, what I want to do is to take input from in-game camera and apply inference and show the bounding boxes in the same view. I think you had a tutorial like this for unity using open-vino but I don't know why I can't I open it now(https://christianjmills.com/posts/openvino-yolox-unity/in-game-camera/). I guess Raycasting can be used to locate Bounding boxes in 3D space. Would be great, if you can provide any leads on this or you can make a tutorial for this. Any help is much appreciated. Thanks in advance.

cj-mills commented 1 year ago

Hi @aforadil,

Sorry about that. I recently removed some outdated tutorials, and that was one of them. The associated Unity project is still on GitHub:

I can reupload the tutorial, but making a new demo project (and tutorial if needed) would make more sense. I could probably make time for that later in the week.

aforadil commented 1 year ago

Hi, Thanks for the reply and sharing the project. Yes, making a new demo with the Barracuda inference and short tutorial would be great. Looking forward to it. Thank you so much,

cj-mills commented 1 year ago

@aforadil To double-check, this is what you are looking for, correct? The sample image is with the built-in render pipeline. URP and HDRP require different approaches. barracuda-inference-yolox-in-game-camera-1

aforadil commented 1 year ago

Yes, I am looking for similar kind of thing with URP. Thanks.

cj-mills commented 1 year ago

@aforadil I added links to demo projects for using the in-game camera in BRP, URP, and HDRP projects in the Conclusion section for this post.

aforadil commented 1 year ago

Thank you so much for it. I will surely check them. Once again thank you.

aforadil commented 1 year ago

Hi @cj-mills, Thanks for your previous help. The project that you shared works great. I was working on integrating it into an XR app scene.

It seems like the BB 2D toolkit package was made to work with Canvas with Render Mode 'Screen Space-Overlay'. When switched to 'Screen Space-Camera' or 'World Space', the BBoxes don't seem to be in the right position. The boxes can be scaled down by reducing the 'Plane distance' but still, positioning isn't right as boxes still stay on origin points. I tried to update the script of the BB 2D package but couldn't do it because of my limited knowledge of Unity. I know it's a lot to ask and you already made a whole project for me. Can you please update the URP project with BBoxes in 'Screen Space- Camera' or 'World Space'? Thanks again for your help.

cj-mills commented 1 year ago

@aforadil The issue with using 'Screen Space-CameraorWorld Space` for the bounding box Canvas is that the annotations become visible to the in-game camera. That means the boxes are in the input images for the object detection model and impact its predictions. The result is a flickering effect, as shown below:

Click to view ![world-space-canvas-flickering](https://github.com/cj-mills/christianjmills/assets/9126128/0fbc76a7-97e9-431b-90c9-279b0db67c73)

Here, the model first detects the objects as expected, leading to bounding box annotations getting drawn on the Canvas, leading to the annotations being in the input image, leading to the model not detecting the object, leading to bounding boxes not getting drawn, and so on.

The solution likely involves using a second Camera and custom layers. I did something similar for my old targeted in-game style transfer tutorial:

Another consideration for that approach with URP and HDRP projects is that you would want to ensure the model is not performing inference for each in-game camera.

aforadil commented 1 year ago

Hi @cj-mills , Thanks for your reply. Thanks for the lead, I already had this thing in mind and was thinking of using another camera. Actually, I was having issues with the positioning of these bounding boxes. I have shared the picture. Can you please share the updated project? Your BBoxes seem to be at the right position when in the camera overlay.

https://github.com/cj-mills/christianjmills/assets/137451055/e7ffa544-ef71-48f7-86bc-d4a0966fef27

cj-mills commented 1 year ago

@aforadil You just need to update the Bounding Box 2D Toolkit package through the Package Manager.

djsoapyknuckles commented 1 year ago

Hi @cj-mills Would you be able to do a tutorial on training/finetuning a model to work with different images of unity 3d objects in a scene environment? How would one create a dataset to train on. Would trainig setup be similar to the hand gesture model?. The intro states the model isn't trained to recognize objects in the unity environment, so im curious to know how we might do that or if its even possible?

cj-mills commented 1 year ago

Hi @djsoapyknuckles,

I don't know when I will have time to make a tutorial for creating custom datasets for in-game object detection, but I can add it to the to-do list.

Creating annotated datasets for Unity environments is not only possible but automatable. Although, setting up the tooling to automate the process might require a little work. Unity has even made a package to assist with creating synthetic datasets in the game engine.

Once you have an annotated dataset, you can use my object detection tutorial (probably with the coco-format training notebook) to train a YOLOX object detection model.

djsoapyknuckles commented 1 year ago

@cj-mills, Thanks so much for the quick reply! Your tutorials are amazing and I'm so glad i found your stuff. Perception looks like exactly the tool for the job for sure! I also found a tool to convert unity's perception-generated json to coco format which will be a big help! https://github.com/lessw2020/perception_tools

aforadil commented 9 months ago

Hi @cj-mills, I hope you are doing well. Thanks for your amazing tutorials and help related to object detection in Unity. I just had few queries related to the project that had URP and in game camera as input. It would be great if you can elaborate them.

  1. I couldn't find the part of code in scripts related to taking input from the camera, For example, in case of multiple game type cameras, how to distinguish between them in scripts?
  2. Can you provide any insights on having multiple cameras and multiple object detectors in a scene. For example, one camera looking on one side and other towards the back. I mean, I don't want the bounding boxes but just information of detected things . How can I modify this project to achieve it?
cj-mills commented 9 months ago

Hi @aforadil,

The C# scripts you are looking for are:

The CameraBlitRendererFeature script attaches to the UniversalRenderer.asset as a Renderer Feature, and the CameraBlitPass script passes the current camera texture data to the inference script.

unity-universal-renderer-asset-inspector-tab

The demo only checks if a camera is an in-game type, so the inference code runs for each in-game camera in the scene.

With the approach in the demo, all the cameras would reference the same InferenceManager object, so you would probably want to update the CameraBlitPass.Execute and InferenceController.Inference methods to include some information to specify the current camera. Here is a simple example of how you can get the current camera's name:

public override void Execute(ScriptableRenderContext context, ref RenderingData renderingData)
{
    var cameraData = renderingData.cameraData;
    if (cameraData.camera.cameraType != CameraType.Game)
        return;

    Debug.Log(renderingData.cameraData.camera.name);

    if (inferenceController)
    {
        inferenceController.Infernece(m_CameraColorTarget.rt);
    }
    else
    {
        Debug.Log("inferenceController not assigned");
    }
}