TencentARC / InstantMesh

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models
Apache License 2.0
2.63k stars 250 forks source link

Please optimize texture saving so that save_obj_with_mtl is faster than generating the video #19

Open yosun opened 2 months ago

yosun commented 2 months ago

Surprised to find that generating the obj + texture (https://replicate.com/p/r366c4nc49rga0cexcr8x5ys6r @ 35s) takes longer than obj + texture + video (https://replicate.com/p/z3s1hyrmtxrga0cexcr8nw89d8 @ 25s)

Please optimize texture saving so that save_obj_with_mtl is faster than generating the video

https://github.com/TencentARC/InstantMesh/blob/59e28cac0b317c75dd054f503ee746c7fffb259c/src/utils/mesh_util.py#L27

--

Benchmarks: obj only @ 12.4s https://replicate.com/p/2ev8akz2s5rga0cexkmvzxg064 obj + video 13.1s @ https://replicate.com/p/djxg387xmhrga0cexkmtbxj89c obj + texture @ 34.3s https://replicate.com/p/r366c4nc49rga0cexcr8x5ys6r (and why does this one take less time, when more assets are generated?) obj + texture + video @ 25.8s https://replicate.com/p/z3s1hyrmtxrga0cexcr8nw89d8

Here's the complete benchmark table based on parameters - The input image is 2048x2048 rm_bg video texmap time link
0 0 0 12.4 https://replicate.com/p/2ev8akz2s5rga0cexkmvzxg064
0 0 1 34.3 https://replicate.com/p/r366c4nc49rga0cexcr8x5ys6r
0 1 0 13.1 https://replicate.com/p/djxg387xmhrga0cexkmtbxj89c
1 0 0 13.5 https://replicate.com/p/z3s1hyrmtxrga0cexcr8nw89d8
1 0 1 29 https://replicate.com/p/q542nw0d0nrg80cezhbr3nmfm8
0 1 1 34.6 https://replicate.com/p/ftwrdbjwksrge0cezhbtj92ntc
1 1 0 14.1 https://replicate.com/p/abdsp8gn4nrge0cezhgtq7eyk8
1 1 1 25.8 https://replicate.com/p/z3s1hyrmtxrga0cexcr8nw89d8
The input image is 512x512 rm_bg video texmap time link
0 0 1 32.1 https://replicate.com/p/x8sp3pse05rga0cezhhte0w3fc
1 0 1 24.5 https://replicate.com/p/wjv6fg2gj5rge0cezhj98nfyg4
1 1 1 26 https://replicate.com/p/a7fyqda369rgc0cezhjv3mgyr8
0 1 1 33 https://replicate.com/p/tsswg1qycdrge0cf15787ryaf4
JustinPack commented 2 months ago

I just did some testing on this and it seems to be related to the remove background option. With the video example you shared, I ran the same settings you had and confirmed that I got a similar result of 25s.

Next, I ran the exact same job but with remove background unchecked and got a 35s result consistent with the obj+texture only run.

I'm going to keep looking into this.

JustinPack commented 2 months ago

@yosun I looked into this further and the cause of the speedup is the image size reduction that comes along with removing the background. The image is scaled to .85 of the original resolution when the remove background step is completed which directly reduces the model processing time.

yosun commented 2 months ago

I don’t feel that’s the full extent - I uploaded background removed images for all the examples

It is definitely in the texture baking step

yosun commented 2 months ago

@JustinPack i have updated the original post with benchmark tables with different parameters. It seems that in each case enabling background removal speeds things up - for both a transparent background 2048x2048 and 512x512?

Does this mean that background removal should always be used for optimal times even if image is already pre-processed for background removal?

JustinPack commented 2 months ago

@yosun The remove background step reduces the size of the image before it gets processed by the multi-view generation step. Smaller image == faster processing speed. I've tested this across multiple images and generational setups(background removal on, off, video on, off). This means that the presence of a background prior to the removal step is not a factor because the process is still run if checked and returns the smaller image regardless. For confirmation you can scale an image to half size yourself and then run the same exact setup on both and you will notice the speed difference between the two.

yosun commented 2 months ago

@yosun The remove background step reduces the size of the image before it gets processed by the multi-view generation step. Smaller image == faster processing speed. I've tested this across multiple images and generational setups(background removal on, off, video on, off). This means that the presence of a background prior to the removal step is not a factor because the process is still run if checked and returns the smaller image regardless. For confirmation you can scale an image to half size yourself and then run the same exact setup on both and you will notice the speed difference between the two.

I feel that we are missing something here.

1) I am still trying to understand the image to 3D step and the subsequent mesh export, where generating the texture takes 10s longer without video generation, and generating the extra video asset saves between 6s - 10s from total gen time. (512x512 and 2048x2048 test input below)

2) I'm copy and pasting the benchmarks below - it seems that the input file resolution does not matter? 512x512 vs 2048x2048 is comparable?

Here's the complete benchmark table based on parameters - The input image is 2048x2048 rm_bg video texmap time link
0 0 0 12.4 https://replicate.com/p/2ev8akz2s5rga0cexkmvzxg064
0 0 1 34.3 https://replicate.com/p/r366c4nc49rga0cexcr8x5ys6r
0 1 0 13.1 https://replicate.com/p/djxg387xmhrga0cexkmtbxj89c
1 0 0 13.5 https://replicate.com/p/z3s1hyrmtxrga0cexcr8nw89d8
1 0 1 29 https://replicate.com/p/q542nw0d0nrg80cezhbr3nmfm8
0 1 1 34.6 https://replicate.com/p/ftwrdbjwksrge0cezhbtj92ntc
1 1 0 14.1 https://replicate.com/p/abdsp8gn4nrge0cezhgtq7eyk8
1 1 1 25.8 https://replicate.com/p/z3s1hyrmtxrga0cexcr8nw89d8
The input image is 512x512 rm_bg video texmap time link
0 0 1 32.1 https://replicate.com/p/x8sp3pse05rga0cezhhte0w3fc
1 0 1 24.5 https://replicate.com/p/wjv6fg2gj5rge0cezhj98nfyg4
1 1 1 26 https://replicate.com/p/a7fyqda369rgc0cezhjv3mgyr8
0 1 1 33 https://replicate.com/p/tsswg1qycdrge0cf15787ryaf4
yosun commented 2 months ago

So in general, requesting that both video and texmap be generated averages around 25s for 2048x2048 and 512x512 input... for speed: always generate a video even if we just want a texture mesh?

throb081 commented 1 month ago

Hello,it seems like you do a lot of test .Did you run the train.py?i don,t know how to build filtered_obj_name.json,i wonder if you can give me some guidance,thankyou