推理第五阶段总是直接崩溃，求助

hpx502766238 commented 4 months ago

控制台日志如下： (venv) PS E:\WSL\2M\Unique3D> python app/gradio_local.py --port 7860 Warning! extra parameter in cli is not verified, may cause erros. Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 29.76it/s] You have disabled the safety checker for <class 'custum_3d_diffusion.custum_pipeline.unifield_pipeline_img2mvimg.StableDiffusionImage2MVCustomPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . Warning! extra parameter in cli is not verified, may cause erros. E:\WSL\2M\Unique3D\venv\lib\site-packages\huggingface_hub\file_download.py:1150: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( Loading pipeline components...: 100%|██████████████████████████████████████████████████| 5/5 [00:00<00:00, 2175.92it/s] You have disabled the safety checker for <class 'custum_3d_diffusion.custum_pipeline.unifield_pipeline_img2img.StableDiffusionImageCustomPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . E:\WSL\2M\Unique3D\venv\lib\site-packages\torch\utils\cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. warnings.warn( Loading pipeline components...: 100%|████████████████████████████████████████████████████| 6/6 [00:04<00:00, 1.33it/s] Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference. Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference. Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference. Loading pipeline components...: 100%|██████████████████████████████████████████████████| 6/6 [00:00<00:00, 5932.54it/s] Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). E:\WSL\2M\Unique3D\venv\lib\site-packages\transformers\models\clip\modeling_clip.py:480: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) attn_output = torch.nn.functional.scaled_dot_product_attention( 0%| | 0/30 [00:00<?, ?it/s]Warning! condition_latents is not None, but self_attn_ref is not enabled! This warning will only be raised once. 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:04<00:00, 6.82it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:10<00:00, 1.03s/it] 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:30<00:00, 1.02s/it] 0%| | 0/200 [00:00<?, ?it/s]E:\WSL\2M\Unique3D\venv\lib\site-packages\torch\utils\cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. warnings.warn( E:\WSL\2M\Unique3D.\mesh_reconstruction\remesh.py:354: UserWarning: Using torch.cross without specifying the dim arg is deprecated. Please either pass the dim explicitly or simply use torch.linalg.cross. The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at ..\aten\src\ATen\native\Cross.cpp:66.) n = torch.cross(e1,cl) + torch.cross(cr,e1) #sum of old normal vectors 100%|████████████████████████████████████████████████████████████████████████████████| 200/200 [00:07<00:00, 25.91it/s] 0%| | 0/100 [00:00<?, ?it/s] (venv) PS E:\WSL\2M\Unique3D>

hpx502766238 commented 4 months ago

每次到第五阶段，大概几十秒之后，内存涨了2G，但还有6个多G空闲，然后程序直接就崩溃退出了，也没有任何报错。 1.所有的环境也重装N遍了，包括cuda cudnn等系统环境，python虚拟环境也重装了，tensorrt代码注释掉了 2.尝试禁用超分，前几个阶段快到飞起，但依然在第五阶段崩溃，不禁用也是一样的卡5阶段，就是速度慢些。 下面是具体环境信息： 硬件环境： GPU:NVIDIA RTX 3060 12G RAM:24G CPU:AMD R7 5800H 系统环境： Win11 Home 23H2 显示驱动版本:Studio Driver 560.70 CUDA Toolkit 12.4 CUDNN 8.9.7 VisualStudioCommunity2022+MSVC14.40.33807 pip虚拟环境 torch==2.3.0+cu121 pip-env.txt

wukailu commented 4 months ago

虽然我没遇到过这样的问题，但是第五阶段相对于第四阶段多出了对于 explicit target 的计算，这里再次用到了 nvdiffrast，你可以检查下是否是 nvdiffrast 的问题。（比如 egl backend 是否支持，不支持的话需要改成 cuda backend 或者较慢的 pytorch3d 实现）

hpx502766238 commented 4 months ago

nvdiffrast

怎么检查nvdiffrast ，如何操作呢，

hpx502766238 commented 4 months ago

又折腾了一晚，重新用pycharm试了下，总算能跑通了。但是还是搞不懂为什么： 1.用pycharm调试器运行，5个阶段完美跑通，没有报错。 2.用pycharm本地终端运行（终端显示用的powershell），5个阶段完美跑通，没有报错。 3.在项目目录用powershell终端打开，.\venv\scripts\Activate.ps1,python app/gradio_local.py --port 7860，运行失败，第五阶段崩溃。 4.在项目目录用powershell终端（管理员）打开，.\venv\scripts\Activate.ps1(或者activate，效果一样),python app/gradio_local.py --port 7860，运行失败，第五阶段崩溃。 5.在项目目录用cmd终端打开，.\venv\scripts\activate.bat,python app/gradio_local.py --port 7860，运行失败，第五阶段崩溃。 6.用pycharm本地终端运行,先deactivate，然后.\venv\scripts\activate，5个阶段完美跑通，没有报错。至此问题已解决，使用pycharm 运行即可。 但是很奇怪，只要自己运行终端，程序就会崩溃，而用pycharm就不会，按道理pycharm也是调用的本地终端，为何会有区别呢。。。。。。。

AiuniAI / Unique3D

推理第五阶段总是直接崩溃，求助 #79