Open Shamino0 opened 3 years ago
Hi @Shamino0 I think this is a duplicate of #50 (perhaps not clear from the issue title)
I don't think it is very hard. It should just be working with segmentation outputs of the model.
What would you use it for?
I don't think it's the same as #50 - that one seems to be asking about pose data (keypoints). I don't have a need for that information right now (although it may be helpful in the future).
Without getting into too much detail, the goal here is to identify heads (front, back, side, etc.) from live video. I've got it working right now by calling get_part_mask using the left_face and right_face parts. I then use a CV2 findContours function to identify all of the segmented regions and generate bounding boxes from them.
The problem is that when two heads are close to each other in the image, CV2 can't distinguish between them, because they are part of a single region, so I get one bounding box around them both.
I'm thinking that segmentMultiPersonParts will fix this, by generating an array of masks (or other related structures) from which I can compute one bounding box each.
Okay, fair point.
I believe the corresponding JavaScript code is in body-pix/src/multi_person/decode_multiple_masks_cpu.ts (in particular decodePersonInstancePartMasks
)
On further inspection it seems that in order to do it the same way as it is done in the upstream JS version, it would require the multi person pose detection as an input (#50).
I started porting the person segmentation functions based on the pose-detection branch. Here's the diff so far: https://github.com/gmontamat/python-tf-bodypix/compare/pose-detection...gmontamat:multiperson-segmentation Feel free to take a look and contribute.
The original TensorFlowJS code includes
net.segmentMultiPerson
andnet.segmentMultiPersonParts
, which allow code to more easily distinguish people when they appear in an image together.I started looking at the code to see if this is something I could port, but I just don't understand TensorFlow or the existing implementations well enough to figure out how.
Please consider this for a future enhancement.
Thanks much.