LIU-Yuxin / SyncMVD

Official PyTorch & Diffusers implementation of "Text-Guided Texturing by Synchronized Multi-View Diffusion"
MIT License
123 stars 8 forks source link

add viewpoints #8

Open YuzhiChen001 opened 6 months ago

YuzhiChen001 commented 6 months ago

Hello, nice work and i am following it. In my experiments, i found that as the viewpoints increases, the gpu memory will also increases,and if there is too much viewpoints, the results in some perspectives will be strange. I guess its because the sd base model is trained on 256256 images, but in syncmvd pipline, there is a concat operation. For example, 8 viewpoints will lead to generate 8256*256 pixel in one time step.So, as the viewpoints increases too much, the noise predict maybe failed. I don't know if my understanding is correct, any help will be appreciate. Thanks!

LIU-Yuxin commented 6 months ago

Processing all views in one batch will face the memory issue as you describe. It should be able to group the views into smaller batches, since there isn't a fully-connected pairwise attention in this method.