utterances-bot commented 3 years ago

Barracuda PoseNet Tutorial Pt. 2 | Christian Mills

This post covers how to implement the preprocessing steps for the PoseNet model.

https://christianjmills.com/Barracuda-PoseNet-Tutorial-2/

Drod917 commented 3 years ago

Thanks for the awesome guide! Is there any way to carry out the preprocessing steps without the compute shader? I.e. to just preprocess on the CPU, albeit slowly? I'm working on an Android build that uses this, but it that breaks due to a linking error between Unity's compute shaders and what seems to be the graphics API that Vuforia uses (OpenGL ES3.1)

cj-mills commented 3 years ago

Thanks, I'm glad you like it! You can definitely perform the preprocessing on the CPU. I don't know if I still have the code for that saved, but I can post an example tomorrow (Tuesday). Also, if you need to avoid compute shaders entirely, you'll need to use the C# Burst worker type in part 3. Check out the comments section in part 3 to see what you need to do to use that.

I'm going to be updating this tutorial series soon. One of the updates is including how to use the more efficient (but less accurate) MobileNet version of the PoseNet model. As the name suggests, it's more suitable for mobile applications.

cj-mills commented 3 years ago

@Drod917 Here is a basic modification to perform the preprocessing steps on the CPU. Keep in mind that the Barracuda library uses compute shaders to perform run models on the GPU so you will need to use the C# Burst worker type as I mentioned above. I'll post another comment tomorrow with the code to use the MobileNet version of the model. You'll get higher frame rates with that one. Also, this is using Barracuda version 1.3.0 since C# was buggy in 1.0.4. Checkout the comments section in part 3 to see what you need to change when using 1.3.0.

using System.Threading.Tasks;

Tensor input = new Tensor(processedImage, channels: 3);
float[] tensor_array = PreprocessResnet(input.data.Download(input.shape));
input = new Tensor(input.shape.batch, input.shape.height, input.shape.width, input.shape.channels, tensor_array);

private Texture2D PreprocessImage()
{
    // Create a new Texture2D with the same dimensions as videoTexture
    Texture2D imageTexture = new Texture2D(videoTexture.width, videoTexture.height, TextureFormat.RGBA32, false);

    // Copy the RenderTexture contents to the new Texture2D
    Graphics.CopyTexture(videoTexture, imageTexture);

    // Make a temporary Texture2D to store the resized image
    Texture2D tempTex = Resize(imageTexture, imageHeight, imageWidth);
    // Remove the original imageTexture
    Destroy(imageTexture);

    return tempTex;
}

private float[] PreprocessResnet(float[] tensor)
{
    float[] imagenetMean = new float[] { -123.15f, -115.90f, -103.06f };

    Parallel.For(0, tensor.Length / 3, (int i) =>
    {

        tensor[i * 3 + 0] = (float)tensor[i * 3 + 0] * 255f + imagenetMean[0];
        tensor[i * 3 + 1] = (float)tensor[i * 3 + 1] * 255f + imagenetMean[1];
        tensor[i * 3 + 2] = (float)tensor[i * 3 + 2] * 255f + imagenetMean[2];

    });

    return tensor;
}

cj-mills commented 3 years ago

@Drod917 Here is a link to a MobileNet version of the PoseNet model. It's not as accurate as the ResNet version, but it's much more efficient on the CPU.

MobileNet PoseNet Model

And here are the code changes to use it.

private string heatmapLayer = "heatmap_2";
private string offsetsLayer = "offset_2";

float[] tensor_array = PreprocessMobilenet(input.data.Download(input.shape));

private float[] PreprocessMobilenet(float[] tensor)
{
    for(int i=0; i < tensor.Length; i++)
    {
        tensor[i] = (float)(2.0f * tensor[i] / 1.0f) - 1.0f;
    }

    return tensor;
}

Drod917 commented 3 years ago

@cj-mills Thank you tons! It's working much better now. I'm getting ~10x performance on the mobile device with these changes + MobileNet over the CSharp worker type.

cj-mills commented 3 years ago

@Drod917 No problem! It will be interesting to see what performance gains there will be for mobile devices once Unity releases their NPU backend for Barracuda. That will allow the models to be run using the dedicated machine learning hardware that's in newer devices.

Kanyade commented 3 years ago

@Drod917 Hi! Could you share how did you solve the C# worker type problem? I am quite new to this technology.

cj-mills commented 3 years ago

Hi @Kanyade, There were some bugs in the version of Barracuda that was out when this tutorial was written. You can update to version 1.3.0 or later to use the C# worker type. However, they changed how to access the shapes of Tensors.

Instead of accessing the height and width with heatmaps.shape[1] and heatmaps.shape[2] respectively, it's now heatmaps.shape.height and heatmaps.shape.width.

I am currently in the process of making an updated version of this tutorial and am using Barracuda version 2.1.0. The C# Burst worker type is working without issue with this version.

Kanyade commented 3 years ago

@cj-mills I will look into these, thank you! Any idea when will it be complete? Not to rush things, I would just love to read it and try it out with up-to-date packages.

cj-mills commented 3 years ago

@Kanyade I won't be able to work on it over this weekend, so hopefully by the end of next week. I might get delayed if anything comes up related to my most recent project.

At the moment, I am integrating the post processing steps for multipose estimation. There are a lot more steps than single pose, and consequently a fair bit more writing is needed to walk through the steps. All the code is working right now, so at least there shouldn't be too much debugging to perform.

You can use Barracuda version 2.1.0 with the current project as long as you update the code to use heatmaps.shape.height and heatmaps.shape.width.

cj-mills / christianjmills

Barracuda PoseNet Tutorial Pt. 2 | Christian Mills #8

Barracuda PoseNet Tutorial Pt. 2 | Christian Mills