AiuniAI / Unique3D

Official implementation of Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
https://wukailu.github.io/Unique3D/
MIT License
2.91k stars 225 forks source link

这次这个程序比起以往的图片转3d实现了质的飞跃啊!点赞!rtx 3060实测需要15分钟生成一个3d模型 #53

Open tomyu168 opened 2 months ago

tomyu168 commented 2 months ago

这次这个程序比起以往的图片转3d实现了质的飞跃啊!点赞,狠狠点赞,就是安装环境折腾了我整整一天。

希望能提供不同显卡输出速度的列表好有个参照,另外这是否和操作系统有关系? image

DamienCz commented 2 months ago

可以问问环境参数那些吗我装了几天了都没有搞定....

wukailu commented 2 months ago

理论上,3060应该只需要不到2分钟。根据输出信息来看,ONNX 并没有运行在GPU上,而是CPU上。所有超分辨率操作会调用 onnx 进行运行,这个在CPU会耗费极大量时间,GPU则不会。请检查 onnxruntime-gpu 是否正确安装

tomyu168 commented 2 months ago

可以问问环境参数那些吗我装了几天了都没有搞定....

哪些啊?我也不是很确定,很多坑可能稀里糊涂过去了,unique3d遇到的一些问题,兄弟可以看看我这个帖子

DamienCz commented 2 months ago

可以问问环境,那些人我装了几天都没有定...

我也担心,很多坑可能稀里糊涂过去了,unique3d遇到了一些问题,可以看看我这个帖子

感谢老铁,双击666

tomyu168 commented 2 months ago

理论上,3060应该只需要不到2分钟。根据输出信息来看,ONNX 并没有运行在GPU上,而是CPU上。所有超分辨率操作会调用 onnx 进行运行,这个在CPU会耗费极大量时间,GPU则不会。请检查 onnxruntime-gpu 是否正确安装

昨天看到消息,折腾了一整天,还是不行,安装onnxruntime-gpu包后,import onnxruntime onnxruntime.get_available_providers() 显示 ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']。onnxruntime.get_device() 显示gpu。但是一旦generate 3d就会提示 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Anaconda3\envs\unique3d-py311\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_tensorrt.dll"

EP Error D:\a_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:456 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported. when using [('TensorrtExecutionProvider', {'device_id': 0, 'trt_max_workspace_size': 8589934592, 'trt_fp16_enable': True, 'trt_engine_cache_enable': True}), ('CUDAExecutionProvider', {'device_id': 0, 'arena_extend_strategy': 'kSameAsRequested', 'gpu_mem_limit': 8589934592, 'cudnn_conv_algo_search': 'HEURISTIC'})] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.

卸载了onnxruntime-gpu重装onnxruntime可以运行,不过今天再测试报了另一大堆的错 RuntimeError: Not compiled with GPU support. Exception raised from FaceAreasNormalsForward at C:\Users\A\AppData\Local\Temp\pip-req-build-6s7k55h9\pytorch3d\csrc\face_areas_normals/face_areas_normals.h:60 (most recent call first): 00007FFE470E366200007FFE470E3600 c10.dll!c10::Error::Error [ @ ] 00007FFE470E311A00007FFE470E30C0 c10.dll!c10::detail::torchCheckFail [ @ ] 00007FFCC9A9B04B _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9AA721E _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9AA72A4 _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9A9774A _C.cp311-win_amd64.pyd! [ @ ] 00007FFE475642CC00007FFE47563550 python311.dll!PyCFunction_GetFlags [ @ ] 00007FFE4751F3D100007FFE4751F2B0 python311.dll!PyObject_MakeTpCall [ @ ] 00007FFE4751F59100007FFE4751F570 python311.dll!PyObject_Vectorcall [ @ ] 00007FFE47617F1A00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4761BD0E00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4751F76D00007FFE4751F730 python311.dll!PyFunction_Vectorcall [ @ ] 00007FFE4751F50E00007FFE4751F420 python311.dll!PyVectorcall_Function [ @ ] 00007FFE4751F61F00007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFD0E87026100007FFD0E85CEA0 torch_python.dll!THPPointer::THPPointer [ @ ] 00007FFE4756430600007FFE47563550 python311.dll!PyCFunction_GetFlags [ @ ] 00007FFE4751F67800007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFE4761D88400007FFE4761D320 python311.dll!PyEval_GetFuncDesc [ @ ] 00007FFE4761903F00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4761BD0E00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4751F76D00007FFE4751F730 python311.dll!PyFunction_Vectorcall [ @ ] 00007FFE4751F50E00007FFE4751F420 python311.dll!PyVectorcall_Function [ @ ] 00007FFE4751F61F00007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFE4761D7BD00007FFE4761D320 python311.dll!PyEval_GetFuncDesc [ @ ] 00007FFE4761903F00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ]

wukailu commented 2 months ago

理论上,3060应该只需要不到2分钟。根据输出信息来看,ONNX 并没有运行在GPU上,而是CPU上。所有超分辨率操作会调用 onnx 进行运行,这个在CPU会耗费极大量时间,GPU则不会。请检查 onnxruntime-gpu 是否正确安装

昨天看到消息,折腾了一整天,还是不行,安装onnxruntime-gpu包后,import onnxruntime onnxruntime.get_available_providers() 显示 ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']。onnxruntime.get_device() 显示gpu。但是一旦generate 3d就会提示 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Anaconda3\envs\unique3d-py311\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_tensorrt.dll"

EP Error D:\a_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:456 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported. when using [('TensorrtExecutionProvider', {'device_id': 0, 'trt_max_workspace_size': 8589934592, 'trt_fp16_enable': True, 'trt_engine_cache_enable': True}), ('CUDAExecutionProvider', {'device_id': 0, 'arena_extend_strategy': 'kSameAsRequested', 'gpu_mem_limit': 8589934592, 'cudnn_conv_algo_search': 'HEURISTIC'})] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.

卸载了onnxruntime-gpu重装onnxruntime可以运行,不过今天再测试报了另一大堆的错 RuntimeError: Not compiled with GPU support. Exception raised from FaceAreasNormalsForward at C:\Users\A\AppData\Local\Temp\pip-req-build-6s7k55h9\pytorch3d\csrc\face_areas_normals/face_areas_normals.h:60 (most recent call first): 00007FFE470E366200007FFE470E3600 c10.dll!c10::Error::Error [ @ ] 00007FFE470E311A00007FFE470E30C0 c10.dll!c10::detail::torchCheckFail [ @ ] 00007FFCC9A9B04B _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9AA721E _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9AA72A4 _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9A9774A _C.cp311-win_amd64.pyd! [ @ ] 00007FFE475642CC00007FFE47563550 python311.dll!PyCFunction_GetFlags [ @ ] 00007FFE4751F3D100007FFE4751F2B0 python311.dll!PyObject_MakeTpCall [ @ ] 00007FFE4751F59100007FFE4751F570 python311.dll!PyObject_Vectorcall [ @ ] 00007FFE47617F1A00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4761BD0E00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4751F76D00007FFE4751F730 python311.dll!PyFunction_Vectorcall [ @ ] 00007FFE4751F50E00007FFE4751F420 python311.dll!PyVectorcall_Function [ @ ] 00007FFE4751F61F00007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFD0E87026100007FFD0E85CEA0 torch_python.dll!THPPointer::THPPointer [ @ ] 00007FFE4756430600007FFE47563550 python311.dll!PyCFunction_GetFlags [ @ ] 00007FFE4751F67800007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFE4761D88400007FFE4761D320 python311.dll!PyEval_GetFuncDesc [ @ ] 00007FFE4761903F00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4761BD0E00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4751F76D00007FFE4751F730 python311.dll!PyFunction_Vectorcall [ @ ] 00007FFE4751F50E00007FFE4751F420 python311.dll!PyVectorcall_Function [ @ ] 00007FFE4751F61F00007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFE4761D7BD00007FFE4761D320 python311.dll!PyEval_GetFuncDesc [ @ ] 00007FFE4761903F00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ]

这个看起来是windows上tensorrt的安装的问题。我也没有测试过在windows安装 tensorrt。建议注释掉 https://github.com/AiuniAI/Unique3D/blob/4263b8c836950babedd9c8b6769aa7c41afa9dbc/scripts/load_onnx.py#L5 中的TensorrtExecutionProvider 只保留 CUDAExecutionProvider。就像

providers = [
    ('CUDAExecutionProvider', {
        'device_id': 0,
        'arena_extend_strategy': 'kSameAsRequested',
        'gpu_mem_limit': 8 * 1024 * 1024 * 1024,
        'cudnn_conv_algo_search': 'HEURISTIC',
    })
]

速度不会有太大的变化(tensorrt后端可能只会快10秒不到,但是确实难装)

因为一旦 TensorrtExecutionProvider 没有正确安装,onnxruntime 会放弃这一组providers,采用默认的['CUDAExecutionProvider', 'CPUExecutionProvider'],而默认的 CUDAExecutionProvider 巨慢无比(比CPU还慢)。如果没有安装tensorrt,反而会正确使用代码里的 CUDAExecutionProvider 配置,速度不会有显著变化。

tomyu168 commented 2 months ago

理论上,3060应该只需要不到2分钟。根据输出信息来看,ONNX 并没有运行在GPU上,而是CPU上。所有超分辨率操作会调用 onnx 进行运行,这个在CPU会耗费极大量时间,GPU则不会。请检查 onnxruntime-gpu 是否正确安装

昨天看到消息,折腾了一整天,还是不行,安装onnxruntime-gpu包后,import onnxruntime onnxruntime.get_available_providers() 显示 ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']。onnxruntime.get_device() 显示gpu。但是一旦generate 3d就会提示 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Anaconda3\envs\unique3d-py311\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_tensorrt.dll" EP Error D:\a_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:456 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported. when using [('TensorrtExecutionProvider', {'device_id': 0, 'trt_max_workspace_size': 8589934592, 'trt_fp16_enable': True, 'trt_engine_cache_enable': True}), ('CUDAExecutionProvider', {'device_id': 0, 'arena_extend_strategy': 'kSameAsRequested', 'gpu_mem_limit': 8589934592, 'cudnn_conv_algo_search': 'HEURISTIC'})] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying. 卸载了onnxruntime-gpu重装onnxruntime可以运行,不过今天再测试报了另一大堆的错 RuntimeError: Not compiled with GPU support. Exception raised from FaceAreasNormalsForward at C:\Users\A\AppData\Local\Temp\pip-req-build-6s7k55h9\pytorch3d\csrc\face_areas_normals/face_areas_normals.h:60 (most recent call first): 00007FFE470E366200007FFE470E3600 c10.dll!c10::Error::Error [ @ ] 00007FFE470E311A00007FFE470E30C0 c10.dll!c10::detail::torchCheckFail [ @ ] 00007FFCC9A9B04B _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9AA721E _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9AA72A4 _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9A9774A _C.cp311-win_amd64.pyd! [ @ ] 00007FFE475642CC00007FFE47563550 python311.dll!PyCFunction_GetFlags [ @ ] 00007FFE4751F3D100007FFE4751F2B0 python311.dll!PyObject_MakeTpCall [ @ ] 00007FFE4751F59100007FFE4751F570 python311.dll!PyObject_Vectorcall [ @ ] 00007FFE47617F1A00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4761BD0E00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4751F76D00007FFE4751F730 python311.dll!PyFunction_Vectorcall [ @ ] 00007FFE4751F50E00007FFE4751F420 python311.dll!PyVectorcall_Function [ @ ] 00007FFE4751F61F00007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFD0E87026100007FFD0E85CEA0 torch_python.dll!THPPointer::THPPointer [ @ ] 00007FFE4756430600007FFE47563550 python311.dll!PyCFunction_GetFlags [ @ ] 00007FFE4751F67800007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFE4761D88400007FFE4761D320 python311.dll!PyEval_GetFuncDesc [ @ ] 00007FFE4761903F00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4761BD0E00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4751F76D00007FFE4751F730 python311.dll!PyFunction_Vectorcall [ @ ] 00007FFE4751F50E00007FFE4751F420 python311.dll!PyVectorcall_Function [ @ ] 00007FFE4751F61F00007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFE4761D7BD00007FFE4761D320 python311.dll!PyEval_GetFuncDesc [ @ ] 00007FFE4761903F00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ]

这个看起来是windows上tensorrt的安装的问题。我也没有测试过在windows安装 tensorrt。建议注释掉

https://github.com/AiuniAI/Unique3D/blob/4263b8c836950babedd9c8b6769aa7c41afa9dbc/scripts/load_onnx.py#L5

中的TensorrtExecutionProvider 只保留 CUDAExecutionProvider。就像

providers = [
    ('CUDAExecutionProvider', {
        'device_id': 0,
        'arena_extend_strategy': 'kSameAsRequested',
        'gpu_mem_limit': 8 * 1024 * 1024 * 1024,
        'cudnn_conv_algo_search': 'HEURISTIC',
    })
]

速度不会有太大的变化(tensorrt后端可能只会快10秒不到,但是确实难装)

因为一旦 TensorrtExecutionProvider 没有正确安装,onnxruntime 会放弃这一组providers,采用默认的['CUDAExecutionProvider', 'CPUExecutionProvider'],而默认的 CUDAExecutionProvider 巨慢无比(比CPU还慢)。如果没有安装tensorrt,反而会正确使用代码里的 CUDAExecutionProvider 配置,速度不会有显著变化。

感谢回复,试了一下不行,会报错

2024-07-05 08:41:42.0879975 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 onnxruntime::TryGetProviderInfo_CUDA] D:\a_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1426 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Anaconda3\envs\unique3d-python310\lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

EP Error EP Error D:\a_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:891 onnxruntime::python::CreateExecutionProviderInstance CUDA_PATH is set but CUDA wasnt able to be loaded. Please install the correct version of CUDA andcuDNN as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. when using [('CUDAExecutionProvider', {'device_id': 0, 'arena_extend_strategy': 'kSameAsRequested', 'gpu_mem_limit': 8589934592, 'cudnn_conv_algo_search': 'HEURISTIC'})] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.


2024-07-05 08:41:42.1887710 [E:onnxruntime:Default, provider_bridge_ort.cc:1745 onnxruntime::TryGetProviderInfo_CUDA] D:\a_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1426 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Anaconda3\envs\unique3d-python310\lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

然后我试着卸载onnxruntime-gpu 1.18.1安装1.17.1不会报错了。就在我激动的以为成功了,在n = torch.cross(e1,cl) + torch.cross(cr,e1) #sum of old normal vectors开始读条第二段时卡壳,整个电脑屏幕僵住了动弹不了,过了10秒后断线了,失败。

inference. Loading pipeline components...: 100%|██████████████████████████████████████████████████| 6/6 [00:00<00:00, 3008.11it/s] Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). D:\Anaconda3\envs\unique3d-python310\lib\site-packages\diffusers\models\attention_processor.py:1279: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.) hidden_states = F.scaled_dot_product_attention( 0%| | 0/30 [00:00<?, ?it/s]Warning! condition_latents is not None, but self_attn_ref is not enabled! This warning will only be raised once. 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:04<00:00, 7.39it/s] 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:21<00:00, 2.18s/it] 100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [01:10<00:00, 2.34s/it] 0%| | 0/200 [00:00<?, ?it/s]D:\Anaconda3\envs\unique3d-python310\lib\site-packages\torch\utils\cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. warnings.warn( F:\Unique3D.\mesh_reconstruction\remesh.py:354: UserWarning: Using torch.cross without specifying the dim arg is deprecated. Please either pass the dim explicitly or simply use torch.linalg.cross. The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at ..\aten\src\ATen\native\Cross.cpp:66.) n = torch.cross(e1,cl) + torch.cross(cr,e1) #sum of old normal vectors 100%|████████████████████████████████████████████████████████████████████████████████| 200/200 [00:36<00:00, 5.43it/s] 0%| | 0/100 [00:00<?, ?it/s] (unique3d-python310) F:\Unique3D>python app/gradio_local.py --port 7860 Warning! extra parameter in cli is not verified, may cause erros. Loading pipeline components...: 100

我发现onnxruntime-gpu 1.18.1版本就算删除TensorrtExecutionProvider, ('TensorrtExecutionProvider', { 'device_id': 0, 'trt_max_workspace_size': 8 1024 1024 * 1024, 'trt_fp16_enable': True, 'trt_engine_cache_enable': True, }), 还会报错[ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\anaconda3\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll" 无法加载cuda 而onnxruntime-gpu 1.17.1版本同样无法加载Tensorrt,但是可以加载cuda,真是神奇。

目前我是这个配置,rtx 3060 12gb显存 onnxruntime-gpu 1.17.1 ,TensorRT-10.0.1.6, CUDNN v8.9.2.26,Python 3.10.14 sp24.sparse24_apply: available sp24.sparse24_apply_dense_output: available sp24._sparse24_gemm: available sp24._cslt_sparse_mm@0.0.0: available swiglu.dual_gemm_silu: available swiglu.gemm_fused_operand_sum: available swiglu.fused.p.cpp: available is_triton_available: True pytorch.version: 2.3.0+cu121 pytorch.cuda: available gpu.compute_capability: 8.6 gpu.name: NVIDIA GeForce RTX 3060 dcgm_profiler: unavailable build.info: available build.cuda_version: 1201 build.hip_version: None build.python_version: 3.10.11 build.torch_version: 2.3.0+cu121 build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0+PTX 9.0 build.env.PYTORCH_ROCM_ARCH: None build.env.XFORMERS_BUILD_TYPE: Release build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None build.env.NVCC_FLAGS: None build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.26.post1 build.nvcc_version: 12.1.66 source.privacy: open source

再次测试

我看了下第三段读条当中等待了10分钟,前两段读条在generate mesh之后约1分钟内完成。第三段读条也很快完成了,当中等待的10分钟是什么呢?第三段读条结束到出现下面这段话等待了约5分钟。 ]D:\Anaconda3\envs\unique3d-python310\lib\site-packages\torch\utils\cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. warnings.warn( F:\Unique3D.\mesh_reconstruction\remesh.py:354: UserWarning: Using torch.cross without specifying the dim arg is deprecated. Please either pass the dim explicitly or simply use torch.linalg.cross. The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at ..\aten\src\ATen\native\Cross.cpp:66.) n = torch.cross(e1,cl) + torch.cross(cr,e1) #sum of old normal vectors

第四段读条很快,第五段比较慢大约用了3-5分钟。 image 最终用时24分钟完成。

我这次生成mesh也是没有用到gpu吗?之前那次我查看onnxruntime.get_device()确实是cpu,这次我看了下 onnxruntime.get_device()是gpu。 image

image

另外每次启动python app/gradio_local.py --port 7860约需7分钟才加载完 Warning! extra parameter in cli is not verified, may cause erros. Loading pipeline components...: 100%|██████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 5.26it/s] You have disabled the safety checker for <class 'custum_3d_diffusion.custum_pipeline.unifield_pipeline_img2mvimg.StableDiffusionImage2MVCustomPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . Warning! extra parameter in cli is not verified, may cause erros. D:\Anaconda3\envs\unique3d-python310\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( Loading pipeline components...: 100%|█████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 557.04it/s] You have disabled the safety checker for <class 'custum_3d_diffusion.custum_pipeline.unifield_pipeline_img2img.StableDiffusionImageCustomPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . D:\Anaconda3\envs\unique3d-python310\lib\site-packages\torch\utils\cpp_extension.py:1967: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST']. warnings.warn( Loading pipeline components...: 100%|██████████████████████████████████████████████████████████| 6/6 [00:44<00:00, 7.43s/it] Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference. Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference. Pipelines loaded with dtype=torch.float16 cannot run with cpu device. It is not recommended to move them to cpu as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support forfloat16 operations on this device in PyTorch. Please, remove the torch_dtype=torch.float16 argument, or use another device for inference. Loading pipeline components...: 100%|████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 6019.09it/s] Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().

这个加载速度正常吗?

我再一次测试用windows任务管理器看了下,第二段读条之后gpu开始100%满负荷工作了。第三段读条显示gpu 0%负载,内存使用约10gb image image

我在读条时玩网页游戏巨卡无比,第三段读条后陷入等待,gpu又开始满负荷100%

image

第五段读条gpu满负荷100%,此时我就连浏览网页都有点卡了 image

合计用时34分钟 image

tomyu168 commented 2 months ago

理论上,3060应该只需要不到2分钟。根据输出信息来看,ONNX 并没有运行在GPU上,而是CPU上。所有超分辨率操作会调用 onnx 进行运行,这个在CPU会耗费极大量时间,GPU则不会。请检查 onnxruntime-gpu 是否正确安装

昨天看到消息,折腾了一整天,还是不行,安装onnxruntime-gpu包后,import onnxruntime onnxruntime.get_available_providers() 显示 ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']。onnxruntime.get_device() 显示gpu。但是一旦generate 3d就会提示 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "D:\Anaconda3\envs\unique3d-py311\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_tensorrt.dll" EP Error D:\a_work\1\s\onnxruntime\python\onnxruntime_pybind_state.cc:456 onnxruntime::python::RegisterTensorRTPluginsAsCustomOps Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported. when using [('TensorrtExecutionProvider', {'device_id': 0, 'trt_max_workspace_size': 8589934592, 'trt_fp16_enable': True, 'trt_engine_cache_enable': True}), ('CUDAExecutionProvider', {'device_id': 0, 'arena_extend_strategy': 'kSameAsRequested', 'gpu_mem_limit': 8589934592, 'cudnn_conv_algo_search': 'HEURISTIC'})] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying. 卸载了onnxruntime-gpu重装onnxruntime可以运行,不过今天再测试报了另一大堆的错 RuntimeError: Not compiled with GPU support. Exception raised from FaceAreasNormalsForward at C:\Users\A\AppData\Local\Temp\pip-req-build-6s7k55h9\pytorch3d\csrc\face_areas_normals/face_areas_normals.h:60 (most recent call first): 00007FFE470E366200007FFE470E3600 c10.dll!c10::Error::Error [ @ ] 00007FFE470E311A00007FFE470E30C0 c10.dll!c10::detail::torchCheckFail [ @ ] 00007FFCC9A9B04B _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9AA721E _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9AA72A4 _C.cp311-win_amd64.pyd! [ @ ] 00007FFCC9A9774A _C.cp311-win_amd64.pyd! [ @ ] 00007FFE475642CC00007FFE47563550 python311.dll!PyCFunction_GetFlags [ @ ] 00007FFE4751F3D100007FFE4751F2B0 python311.dll!PyObject_MakeTpCall [ @ ] 00007FFE4751F59100007FFE4751F570 python311.dll!PyObject_Vectorcall [ @ ] 00007FFE47617F1A00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4761BD0E00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4751F76D00007FFE4751F730 python311.dll!PyFunction_Vectorcall [ @ ] 00007FFE4751F50E00007FFE4751F420 python311.dll!PyVectorcall_Function [ @ ] 00007FFE4751F61F00007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFD0E87026100007FFD0E85CEA0 torch_python.dll!THPPointer::THPPointer [ @ ] 00007FFE4756430600007FFE47563550 python311.dll!PyCFunction_GetFlags [ @ ] 00007FFE4751F67800007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFE4761D88400007FFE4761D320 python311.dll!PyEval_GetFuncDesc [ @ ] 00007FFE4761903F00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4761BD0E00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ] 00007FFE4751F76D00007FFE4751F730 python311.dll!PyFunction_Vectorcall [ @ ] 00007FFE4751F50E00007FFE4751F420 python311.dll!PyVectorcall_Function [ @ ] 00007FFE4751F61F00007FFE4751F5D0 python311.dll!PyObject_Call [ @ ] 00007FFE4761D7BD00007FFE4761D320 python311.dll!PyEval_GetFuncDesc [ @ ] 00007FFE4761903F00007FFE47613780 python311.dll!PyEval_EvalFrameDefault [ @ ]

这个看起来是windows上tensorrt的安装的问题。我也没有测试过在windows安装 tensorrt。建议注释掉

https://github.com/AiuniAI/Unique3D/blob/4263b8c836950babedd9c8b6769aa7c41afa9dbc/scripts/load_onnx.py#L5

中的TensorrtExecutionProvider 只保留 CUDAExecutionProvider。就像

providers = [
    ('CUDAExecutionProvider', {
        'device_id': 0,
        'arena_extend_strategy': 'kSameAsRequested',
        'gpu_mem_limit': 8 * 1024 * 1024 * 1024,
        'cudnn_conv_algo_search': 'HEURISTIC',
    })
]

速度不会有太大的变化(tensorrt后端可能只会快10秒不到,但是确实难装)

因为一旦 TensorrtExecutionProvider 没有正确安装,onnxruntime 会放弃这一组providers,采用默认的['CUDAExecutionProvider', 'CPUExecutionProvider'],而默认的 CUDAExecutionProvider 巨慢无比(比CPU还慢)。如果没有安装tensorrt,反而会正确使用代码里的 CUDAExecutionProvider 配置,速度不会有显著变化。

哥,我把onnxruntime-gpu改为1.17.1然后tensorrt改为TensorRT-8.6.1.6之后不会出现无法加载问题了,第一、第二段读条速度直接起飞,然后卡机退出,似乎是显存不够?这个程序最低要求多少显存呢,我是12gb显存 image

测试了trt_max_workspace_size和gpu_mem_limit改10、4、2、0.5都没用 image

wukailu commented 2 months ago

onnx 这块的任务是完成一个x4 的超分辨率(从512->2048)占的显存会比较大,我们自己测试是至少要8GB显存(6GB都不行),加上别的,应该是18GB的样子。如果要压缩到12GB,要么这块全跑在CPU上(就十五分钟一次,巨慢),要么去掉这个x4的超分,改成普通的resize(理论上不会差太多)。

https://github.com/AiuniAI/Unique3D/blob/4263b8c836950babedd9c8b6769aa7c41afa9dbc/scripts/refine_lr_to_sr.py#L53 这里直接把 img 转成 PIL Image 然后调用 PIL Image 的 resize 函数,增加到原分辨率的四倍(长宽各4)。结果转成np.ndarray 作为 output。

至于运行时卡顿的话,这个要看看你电脑内存够不够了,大概30GB的内存还是需要的(主要是模型太多了)。

tomyu168 commented 2 months ago

onnx 这块的任务是完成一个x4 的超分辨率(从512->2048)占的显存会比较大,我们自己测试是至少要8GB显存(6GB都不行),加上别的,应该是18GB的样子。如果要压缩到12GB,要么这块全跑在CPU上(就十五分钟一次,巨慢),要么去掉这个x4的超分,改成普通的resize(理论上不会差太多)。

https://github.com/AiuniAI/Unique3D/blob/4263b8c836950babedd9c8b6769aa7c41afa9dbc/scripts/refine_lr_to_sr.py#L53

这里直接把 img 转成 PIL Image 然后调用 PIL Image 的 resize 函数,增加到原分辨率的四倍(长宽各4)。结果转成np.ndarray 作为 output。 至于运行时卡顿的话,这个要看看你电脑内存够不够了,大概30GB的内存还是需要的(主要是模型太多了)。

image

兄弟,太感谢了,今天稍微研究了一下代码把所有x4超分全部注释或者修改了,我测试了下效果反而比使用超分的还好!另外那个速度简直起飞了,干到了2分钟内!原地起飞!

tomyu168 commented 2 months ago

onnx 这块的任务是完成一个x4 的超分辨率(从512->2048)占的显存会比较大,我们自己测试是至少要8GB显存(6GB都不行),加上别的,应该是18GB的样子。如果要压缩到12GB,要么这块全跑在CPU上(就十五分钟一次,巨慢),要么去掉这个x4的超分,改成普通的resize(理论上不会差太多)。

https://github.com/AiuniAI/Unique3D/blob/4263b8c836950babedd9c8b6769aa7c41afa9dbc/scripts/refine_lr_to_sr.py#L53

这里直接把 img 转成 PIL Image 然后调用 PIL Image 的 resize 函数,增加到原分辨率的四倍(长宽各4)。结果转成np.ndarray 作为 output。 至于运行时卡顿的话,这个要看看你电脑内存够不够了,大概30GB的内存还是需要的(主要是模型太多了)。

今天又测试了若干次,发现一般是7分钟左右,少数情况2分钟,还有的10分钟。似乎和内存、显存释放有关系?因为我发现把终端关闭重启,做同样的图会提速一些,连续输出趋势是越来越慢。耗时问题出现在refine_rgbs和normal prediction上面。特别是normal prediction快的时候1分钟,慢则可能8-10分钟。另外我发现unique3d运行时,任务管理器显示磁盘使用率很高,是不是意思硬盘读取速度有影响?如果是固态硬盘可能会快一些?

hpx502766238 commented 1 month ago

onnx 这块的任务是完成一个x4 的超分辨率(从512->2048)占的显存会比较大,我们自己测试是至少要8GB显存(6GB都不行),加上别的,应该是18GB的样子。如果要压缩到12GB,要么这块全跑在CPU上(就十五分钟一次,巨慢),要么去掉这个x4的超分,改成普通的resize(理论上不会差太多)。 https://github.com/AiuniAI/Unique3D/blob/4263b8c836950babedd9c8b6769aa7c41afa9dbc/scripts/refine_lr_to_sr.py#L53

这里直接把 img 转成 PIL Image 然后调用 PIL Image 的 resize 函数,增加到原分辨率的四倍(长宽各4)。结果转成np.ndarray 作为 output。 至于运行时卡顿的话,这个要看看你电脑内存够不够了,大概30GB的内存还是需要的(主要是模型太多了)。

image

兄弟,太感谢了,今天稍微研究了一下代码把所有x4超分全部注释或者修改了,我测试了下效果反而比使用超分的还好!另外那个速度简直起飞了,干到了2分钟内!原地起飞!

兄弟你还有用tensorrt吗?还是说用的cuda execution provider?我也是3060 12g ,tensorrt10.0环境下一直无法加载tensorrt ep,各种报错

tomyu168 commented 1 month ago

onnx 这块的任务是完成一个x4 的超分辨率(从512->2048)占的显存会比较大,我们自己测试是至少要8GB显存(6GB都不行),加上别的,应该是18GB的样子。如果要压缩到12GB,要么这块全跑在CPU上(就十五分钟一次,巨慢),要么去掉这个x4的超分,改成普通的resize(理论上不会差太多)。 https://github.com/AiuniAI/Unique3D/blob/4263b8c836950babedd9c8b6769aa7c41afa9dbc/scripts/refine_lr_to_sr.py#L53

这里直接把 img 转成 PIL Image 然后调用 PIL Image 的 resize 函数,增加到原分辨率的四倍(长宽各4)。结果转成np.ndarray 作为 output。 至于运行时卡顿的话,这个要看看你电脑内存够不够了,大概30GB的内存还是需要的(主要是模型太多了)。

image 兄弟,太感谢了,今天稍微研究了一下代码把所有x4超分全部注释或者修改了,我测试了下效果反而比使用超分的还好!另外那个速度简直起飞了,干到了2分钟内!原地起飞!

兄弟你还有用tensorrt吗?还是说用的cuda execution provider?我也是3060 12g ,tensorrt10.0环境下一直无法加载tensorrt ep,各种报错

我改到TensorRT-8.6.1.6 onnxgpu 1.17.1了. tensor 10 onnxgpu 1.18我一运行就卡住崩溃,原因不明