facebookresearch / DensePose

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body
http://densepose.org
Other
6.95k stars 1.3k forks source link

PR: 60x Speedup on IUV2FBC Function. Greatly speeds up converting a 2D person image to a 3D model. #98

Open davidleejy opened 6 years ago

davidleejy commented 6 years ago

Hello, I am writing to enquire about the possibility of merging a PR that has a 60x speedup to IUV2FBC function (_in DensePose/detectron/utils/denseposemethods.py).

IUV2FBC function converts an IUV point to an XYZ point on a SMPL model. This speed up will convenience users who would like to make a 3D person out of a 2D person image (e.g. in virtual reality)

Timings

Run IUV2FBC on 125 IUV points in 'DensePoseData/demo_data/demo_dp_single_ann.pkl' ...

Original:

8629403 function calls in 5.227 seconds
Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.227    5.227 <string>:1(<module>)
        1    0.001    0.001    5.227    5.227 <ipython-input-40-02014517a6cb>:32(f)
      125    0.088    0.001    5.219    0.042 densepose_methods.py:270(IUV2FBC)
    35578    0.430    0.000    5.064    0.000 densepose_methods.py:62(barycentric_coordinates_exists)
   108588    1.486    0.000    4.536    0.000 numeric.py:1591(cross)

Sped up:

39129 function calls in 0.081 seconds
Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.081    0.081 <string>:1(<module>)
        1    0.001    0.001    0.081    0.081 <ipython-input-39-6b6ed951a68c>:32(f)
      125    0.014    0.000    0.079    0.001 densepose_methods.py:174(IUV2FBC_fast)
      125    0.035    0.000    0.059    0.000 densepose_methods.py:135(barycentric_coordinates_fast)
      375    0.008    0.000    0.020    0.000 numeric.py:1591(cross)

Run IUV2FBC on 80000 IUV points ...

summary

Original: Estimated ~53 minutes (80000 points / 125 points * 5 secs ~= 53 mins)

Sped up: 1 minute (timing obtained from a profiled run)

PR Contribution

notebooks/DensePose-Fast-IUV-2-XYZ.ipynb : Is a demo. Also addresses issue #87 .

detectron/utils/densepose_methods.py : Contains sped-up codes. Backward-compatible since old functions not modified. New functions: IUV2FBC_fast and barycentric_coordinates_fast.

DensePoseData/demo_data_2/* : Data given for the convenience of running DensePose-Fast-IUV-2-XYZ.ipynb .

jaggernaut007 commented 6 years ago

has this been pushed to the repo ? Does it have any bugs?

RSKothari commented 3 years ago

Hi all, unfortunately this is still pretty slow for my application. It takes more than 20 seconds on a single image (I have a lot of images)! Is there a way to remove looping from the IUV2FBC function?

vkhalidov commented 3 years ago

@RSKothari have you tried using PyTorch3D to visualize textures on a 3D mesh? https://github.com/facebookresearch/pytorch3d/blob/master/docs/tutorials/render_densepose.ipynb

RSKothari commented 3 years ago

Hi again @vkhalidov , I'm trying to use PnP with the detected pixel points and the corresponding 3D meshpoints specifically for each body part. I've sub-sampled the number of pixel points to 1/9th the original number but still cannot get my function to scale for a large number of images (around 130K images).

RSKothari commented 3 years ago

@vkhalidov Any hints? I'm essentially trying to find an efficient version of IUV2FBC function which doesn't require loops. Any hint on how to proceed with this?

vkhalidov commented 3 years ago

@RSKothari to find the face (Q1, Q2, Q3) that contains vertex P you don't need loops. The condition is that P lies on the same side wrt Q2-Q1, Q3-Q2 and Q1-Q3. So If you have N points {P_n | n =1 ... N} and M faces {(Q1_m, Q2_m, Q3_m) | m=1 ... M}, you can compute the sign of the following determinants:

(x2_m - x1_m) * (y_n - y1_m) - (y2_m - y1_m) * (x_n - x1_m)
(x3_m - x2_m) * (y_n - y2_m) - (y3_m - y2_m) * (x_n - x2_m)
(x1_m - x3_m) * (y_n - y3_m) - (y1_m - y3_m) * (x_n - x3_m)

and choose the first face for which all those values are non-negative (if any). Computing barycentric coordinates from there is trivial.

This approach does not have any loops. You can implement it and see what are the gains. There are many values that can be precomputed -- basically everything related to face points. The time complexity is O(MN). This approach is easy to implement, but it performs lots of redundant computations (e.g. for most of the edges it performs 2 computations for the 2 different faces that share that edge).