facebookresearch / sapiens

High-resolution models for human tasks.
https://about.meta.com/realitylabs/codecavatars/sapiens/
Other
4.31k stars 230 forks source link

seg lite error #41

Closed swzparker7 closed 1 month ago

swzparker7 commented 1 month ago

The result of my using the seg command “python vis_seg.py ./sapiens_lite_host/torchscript/seg/checkpoints/sapiens_0.6b/sapiens_0.6b_goliath_best_goliath_mIoU_7777_epoch_178_torchscript.pt2 --input ../data/frames/XDmfBTgIRUg_9 --batch-size=8 --output-root=../data/seg/XDmfBTgIRUg_9” is as follows, the image is not semantically segmented: image image

and if I execute the depth command again, the error will be reported as follows: *[ERROR/ForkPoolWorker-3] Traceback (most recent call last): File "/mnt/local6T/wangzheng/sapiens/lite/demo/worker_pool.py", line 20, in call result = self.__callable(args, kwargs) File "/mnt/local6T/wangzheng/sapiens/lite/demo/vis_depth.py", line 131, in img_save_and_viz (depth_foreground - min_val) / (max_val - min_val) UnboundLocalError: local variable 'min_val' referenced before assignment

I further modified my seg command by adding --shape "--shape 512",but he reported an error “RuntimeError: The size of tensor a (1024) must match the size of tensor b (3072) at non-singleton dimension 1

I would like to ask if it is only possible to detect the whole body with the head and not the single head. Looking forward to your reply, will be very helpful to me. Thank you for your time.

swzparker7 commented 1 month ago

PS: Running the example data pose/data/reel1 with the above command is able to segment and estimate properly, so I wonder whether or not there are limitations on detection image.

rawalkhirodkar commented 1 month ago

Hello, in these two images, the segmentation prediction does not contain any foreground class.

We use this foreground mask to normalize the depth prediction in 0 to 1 to visualize.

This is the reason for the crash on your examples.

Why does the segmentation not work on these images?: my guess is the original resolution of these two images is much smaller than 1024×768, this results in a blurry upsampling and the network thinks it's part of the background.

GloryyrolG commented 2 weeks ago

Hello, in these two images, the segmentation prediction does not contain any foreground class.

We use this foreground mask to normalize the depth prediction in 0 to 1 to visualize.

This is the reason for the crash on your examples.

Why does the segmentation not work on these images?: my guess is the original resolution of these two images is much smaller than 1024×768, this results in a blurry upsampling and the network thinks it's part of the background.

Hi @rawalkhirodkar ,

may I ask to clarify if it means segmentation is more important than depth estimation, etc.? according to the demo, once it does not segment the person, depth estimation makes no sense. thx & best,