How hard would it be to port the multi-person functions?

de-code / python-tf-bodypix

A Python implementation of the bodypix model.

MIT License

124 stars 20 forks source link

How hard would it be to port the multi-person functions? #70

Open Shamino0 opened 3 years ago

Shamino0 commented 3 years ago

The original TensorFlowJS code includes net.segmentMultiPerson and net.segmentMultiPersonParts, which allow code to more easily distinguish people when they appear in an image together.

I started looking at the code to see if this is something I could port, but I just don't understand TensorFlow or the existing implementations well enough to figure out how.

Please consider this for a future enhancement.

Thanks much.

de-code commented 3 years ago

Hi @Shamino0 I think this is a duplicate of #50 (perhaps not clear from the issue title)

I don't think it is very hard. It should just be working with segmentation outputs of the model.

What would you use it for?

Shamino0 commented 3 years ago

I don't think it's the same as #50 - that one seems to be asking about pose data (keypoints). I don't have a need for that information right now (although it may be helpful in the future).

Without getting into too much detail, the goal here is to identify heads (front, back, side, etc.) from live video. I've got it working right now by calling get_part_mask using the left_face and right_face parts. I then use a CV2 findContours function to identify all of the segmented regions and generate bounding boxes from them.

The problem is that when two heads are close to each other in the image, CV2 can't distinguish between them, because they are part of a single region, so I get one bounding box around them both.

I'm thinking that segmentMultiPersonParts will fix this, by generating an array of masks (or other related structures) from which I can compute one bounding box each.

de-code commented 3 years ago

Okay, fair point.

I believe the corresponding JavaScript code is in body-pix/src/multi_person/decode_multiple_masks_cpu.ts (in particular decodePersonInstancePartMasks)

de-code commented 3 years ago

On further inspection it seems that in order to do it the same way as it is done in the upstream JS version, it would require the multi person pose detection as an input (#50).

gmontamat commented 3 years ago

I started porting the person segmentation functions based on the pose-detection branch. Here's the diff so far: https://github.com/gmontamat/python-tf-bodypix/compare/pose-detection...gmontamat:multiperson-segmentation Feel free to take a look and contribute.