ajaichemmanam / simple_bodypix_python

A simple and minimal bodypix inference in python
76 stars 19 forks source link

Much worse results than js version #9

Closed benschlueter closed 3 years ago

benschlueter commented 3 years ago

Hello,

the results I am getting with the python code are much worse than the ones with javascript in the browser. I've already searched in the JS Source to observe the reason for that.

Detail: I want to blur the background of an image while the person in the foreground is unaffected. So I need the Segmentation of the person. The JS function (https://github.com/tensorflow/tfjs-models/blob/896650dff8a3075dd696b8c7cde96acfdf87802c/body-pix/src/body_pix_model.ts#L527) is the one which is called for that reason. There these functions are called,

-> segmentPersonActivation
    -> predictForPersonSegmentation
         -> this.baseModel.predict

which segments the image and return the "segmented" person. As far as I can see only padAndResize in segmentPersonActivation seems to be different compared to the python-code.

So my questions are: Has anyone encounterd the same issue and knows a solution? Is the network structure transferd correctly (and the weights)? (I think so, but just in case) Where could I take a closer look (Code in JS) or what should I implement to increase "performance"

Thanks

ajaichemmanam commented 3 years ago

Please mention the model, precision and stride that you are using. There are multiple models that you can choose for speed vs accuracy tradeoffs.

And for the requirement "blurring the background of person", I have seen multiple implementations based on this repo

benschlueter commented 3 years ago

I used the same in the browser and local.

mobilenet/075/stride16

I observed that the network only outputs (with my test image) segments of size (95,74,1) and in the browser (1158,901,1). I did not investigate deeply but is this tuneable? I think so, and currently I am searching where... but if you now it you can save me some time :smile:

Yep blurring should't be a problem.

ajaichemmanam commented 3 years ago

If you are looking at accuracy, the resnet model is better and also you should use full precision model than 0.75.

The network is input size independent. It divides the image with the stride you are choosing. The output of the network will then gets multiplied with the stride and offset is added to scale it back to the original dimensions.

benschlueter commented 3 years ago

Well of course the resnet model is better but it does not explain the difference between the python and JS implementation.

In browser the mobilenet is delevering fantastic results, but in python not. I am just confused why these two impl. are so divergent.

But I'll try a bit out for now. If I still suck I'll post some examples and if I found a solution I'll share it :smiley:

benschlueter commented 3 years ago

Ahh I think I got it,

I need a scaling factor like in JS with "low medium high and full", in JS I used low(0.25) and it works good if I use "full" the results are similar. That was the mistake.