OniroAI / Semantic-segmentation-with-MobileNetV3

TensorFlow (Keras) implementation of MobileNetV3 and its segmentation head
GNU General Public License v3.0
61 stars 14 forks source link

Pre-trained Model & Segmentation Questions #3

Closed InternetMaster1 closed 4 years ago

InternetMaster1 commented 4 years ago

Would it be possible to provide a pre-trained model for quick evaluation?

Many thanks in advance!

voeykovroman commented 4 years ago

Updated Readme. You may find links to the pre-trained models there. But, please, bear in mind that these models were trained only for the task of human segmentation using Pixart and Supervisely Person Datasets.

InternetMaster1 commented 4 years ago

Thank you very much for the super-quick answer and for providing the pre-trained models.

High-Accuracy Human segmentation is exactly what I am looking for!

A couple more questions :

1) What is the license of this library? Can it be used for commercial purpose?

2) I have the Supervisely, but am unaware of Pixart dataset. Is it possible to provide a link for the same?

3) In the final output mask, how can I even get the objects that a person is holding, say a cup, a purse, a tennis racquet, a toy, a magazine. It could be just about anything.

I am very much perplexed with this problem.

If I am not mistaken, the supervisely dataset doesn't contain masks for objects that the person might be holding. To achieve this, would a dataset like Supervisely be unfit for the job? Or we need to train on a dataset with more labels than just "person"?

But ideally, if an object is lying on the side, it is ok if it does not come in the mask. But if the person is holding the object, it should definitely come in the final mask.

How can this be achieved?

InternetMaster1 commented 4 years ago

@voeykovroman

I tried running the pre-trained tflite file on https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation

Its giving the following error :

Something went wrong: Cannot convert between a TensorFlowLite buffer with 602112 bytes and a Java Buffer with 3000000 bytes.

I tried the solution mentioned in this issue but to no avail.

The formula is correct?

ByteBuffer.allocateDirect(1 imageSize imageSize NUM_CLASSES 4)

voeykovroman commented 4 years ago
  1. The license is GNU GPLv3
  2. My bad, it's actually called PicsArt AI Hackathon, but unfortunately it's not available anymore as far as I can see, so it may be replaced with any available person dataset.
  3. Actually, it's quite complicated task. The first approach I would suggest is to run any object detection network (trained on dataset with many classes, of course) in parallel with person segmentation. Another approach is to use instance segmentation to find masks of every object on the image (also you'll need specific datasets and network architecture for this task). Or if you want this approach to include object, which is being hold by a person, into a final person mask you'll need to modify masks manually before training as you wish (usually all public datasets don't contain such objects in person markup).
InternetMaster1 commented 4 years ago

Thank you Roman for the detailed answer.

  1. Thanks
  2. Ok, no worries.
  3. The first approach does sound complicated. Your second approach sound very interesting - basically, a custom training dataset which contains objects too as part of the person's mask, correct? Great, I will try that out.
InternetMaster1 commented 4 years ago

@voeykovroman

Just two more questions. Thank you for your patience :)

  1. Buffer Issue

Something went wrong: Cannot convert between a TensorFlowLite buffer with 602112 bytes and a Java Buffer with 3000000 bytes.

  1. Architecture/Backbone

I am lost in the sea of so many libraries for semantic segmentation. For mobile usage, but for highest accuracy & mask quality (rather than fastest), what would be a good option?

MobileNetV2, MobileNetv3, BiseNet, or something else? I am even encountering libraries such as PortraitNet, SINet/ExtremeC3Net, etc... I am very confused...

Could you please point me in the right direction?

voeykovroman commented 4 years ago

Sorry for a such late response, but only now have time to return to the repo.

  1. I didn't encounter this problem when I used my TFLite model, but I would recommend again to check the buffer size formulas and maybe convert the model from TF to the TFLite with the actual version of Tensorflow.
  2. Well, it's up to you to decide, but if the inference time is not the issue in your case you can simply choose the best architecture and convert it into the mobile version for subsequent usage. In order to decide which one to use you may look at https://paperswithcode.com/task/semantic-segmentation