Closed blackmrb closed 5 months ago
Hi! Unfortunately I have not seen this error. Did you try the suggested fix at the bottom of the error message? Also, is it perhaps related to https://github.com/nerfstudio-project/nerfstudio/issues/2615 ?
Hi! Unfortunately I have not seen this error. Did you try the suggested fix at the bottom of the error message? Also, is it perhaps related to nerfstudio-project/nerfstudio#2615 ?
Thanks for the replay~
I have refered https://github.com/nerfstudio-project/nerfstudio/issues/2615, it can run successly, but the default cmd python nerfstudio/scripts/train.py neurad pandaset-data
is very slow. I think this may have something to do with disabling PyTorch's acceleration.
Now I am trying the Dockerfile.
Thanks for the great work!
My CUDA version is 11.4, my torch version is below.
pytorch-msssim 1.0.0 torch 2.1.2+cu118 torch-fidelity 0.3.0 torchmetrics 1.3.2 torchvision 0.16.2+cu118
I firstly run raw nerfstudio pkg successfully using this env, then I uninstall the raw nerfstudio and pip install
neurad-studio
, then I meet error below. Could you please give some advise?PYDEV DEBUGGER WARNING: sys.settrace() should not be used when the debugger is being used. This may cause the debugger to stop working correctly. If this is needed, please check: http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html to see how to restore the debug tracing back correctly. Call Location: File "/home/rongbo.ma/code/neurad-studio/nerfstudio/viewer_legacy/server/viewer_utils.py", line 63, in __enter__ sys.settrace(self.func) Traceback (most recent call last): File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module> cli.main() File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "nerfstudio/scripts/train.py", line 278, in <module> entrypoint() File "nerfstudio/scripts/train.py", line 269, in entrypoint main( File "nerfstudio/scripts/train.py", line 254, in main launch( File "nerfstudio/scripts/train.py", line 196, in launch main_func(local_rank=0, world_size=world_size, config=config) File "nerfstudio/scripts/train.py", line 107, in train_loop trainer.train() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 315, in train loss, loss_dict, metrics_dict = self.train_iteration(step) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner out = func(*args, **kwargs) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 554, in train_iteration _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner out = func(*args, **kwargs) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/pipelines/ad_pipeline.py", line 85, in get_train_loss_dict ray_bundle, batch = self.datamanager.next_train(step) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 294, in next_train ray_bundle, batch = self.next_batch if self.next_batch is not None else self._get_from_queue() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 304, in _get_from_queue ray_bundle, batch = self.data_procs[0].get_batch_and_ray_bundle() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 152, in get_batch_and_ray_bundle lidar_batch, lidar_ray_bundle = self.get_lidar_batch_and_ray_bundle() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 168, in get_lidar_batch_and_ray_bundle ray_bundle: RayBundle = self.lidar_ray_generator(ray_indices, points=batch["lidar"]) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/model_components/ray_generators.py", line 83, in forward ray_bundle = self.lidars.generate_rays( File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 315, in generate_rays raybundle = lidars._generate_rays_from_points(lidar_indices, points, lidar_opt_to_lidar) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 372, in _generate_rays_from_points points_world = transform_points_pairwise(points[..., :3], l2w) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn return fn(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame result = inner_convert(frame, cache_size, hooks, frame_state) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn return fn(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert return _compile( File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper r = func(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 491, in compile_inner out_code = transform_code_object(code, transform) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object transformations(instructions, code_options) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 458, in transform tracer.run() File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2069, in run super().run() File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 719, in run and self.step() File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in step getattr(self, inst.opname)(inst) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2157, in RETURN_VALUE self.output.compile_subgraph( File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 833, in compile_subgraph self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 957, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper r = func(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1024, in call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1009, in call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper compiled_gm = compiler_fn(gm, example_inputs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/__init__.py", line 1607, in __call__ return self.compiler_fn(model_, inputs_, **self.kwargs) torch._dynamo.exc.BackendCompilerFailed: backend='eager' raised: TypeError: eager() got an unexpected keyword argument 'mode' Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True
import torch._dynamo torch._dynamo.config.suppress_errors = True
Append these 2 lines in the train.py
file after all the imports
Thanks for the great work! My CUDA version is 11.4, my torch version is below.
pytorch-msssim 1.0.0 torch 2.1.2+cu118 torch-fidelity 0.3.0 torchmetrics 1.3.2 torchvision 0.16.2+cu118
I firstly run raw nerfstudio pkg successfully using this env, then I uninstall the raw nerfstudio and pip install
neurad-studio
, then I meet error below. Could you please give some advise?PYDEV DEBUGGER WARNING: sys.settrace() should not be used when the debugger is being used. This may cause the debugger to stop working correctly. If this is needed, please check: http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html to see how to restore the debug tracing back correctly. Call Location: File "/home/rongbo.ma/code/neurad-studio/nerfstudio/viewer_legacy/server/viewer_utils.py", line 63, in __enter__ sys.settrace(self.func) Traceback (most recent call last): File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module> cli.main() File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "nerfstudio/scripts/train.py", line 278, in <module> entrypoint() File "nerfstudio/scripts/train.py", line 269, in entrypoint main( File "nerfstudio/scripts/train.py", line 254, in main launch( File "nerfstudio/scripts/train.py", line 196, in launch main_func(local_rank=0, world_size=world_size, config=config) File "nerfstudio/scripts/train.py", line 107, in train_loop trainer.train() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 315, in train loss, loss_dict, metrics_dict = self.train_iteration(step) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner out = func(*args, **kwargs) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 554, in train_iteration _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner out = func(*args, **kwargs) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/pipelines/ad_pipeline.py", line 85, in get_train_loss_dict ray_bundle, batch = self.datamanager.next_train(step) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 294, in next_train ray_bundle, batch = self.next_batch if self.next_batch is not None else self._get_from_queue() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 304, in _get_from_queue ray_bundle, batch = self.data_procs[0].get_batch_and_ray_bundle() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 152, in get_batch_and_ray_bundle lidar_batch, lidar_ray_bundle = self.get_lidar_batch_and_ray_bundle() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 168, in get_lidar_batch_and_ray_bundle ray_bundle: RayBundle = self.lidar_ray_generator(ray_indices, points=batch["lidar"]) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/model_components/ray_generators.py", line 83, in forward ray_bundle = self.lidars.generate_rays( File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 315, in generate_rays raybundle = lidars._generate_rays_from_points(lidar_indices, points, lidar_opt_to_lidar) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 372, in _generate_rays_from_points points_world = transform_points_pairwise(points[..., :3], l2w) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn return fn(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame result = inner_convert(frame, cache_size, hooks, frame_state) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn return fn(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert return _compile( File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper r = func(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 491, in compile_inner out_code = transform_code_object(code, transform) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object transformations(instructions, code_options) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 458, in transform tracer.run() File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2069, in run super().run() File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 719, in run and self.step() File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in step getattr(self, inst.opname)(inst) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2157, in RETURN_VALUE self.output.compile_subgraph( File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 833, in compile_subgraph self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 957, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper r = func(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1024, in call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1009, in call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper compiled_gm = compiler_fn(gm, example_inputs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/__init__.py", line 1607, in __call__ return self.compiler_fn(model_, inputs_, **self.kwargs) torch._dynamo.exc.BackendCompilerFailed: backend='eager' raised: TypeError: eager() got an unexpected keyword argument 'mode' Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True
import torch._dynamo torch._dynamo.config.suppress_errors = True
Append these 2 lines in the
train.py
file after all the imports
import torch._dynamo
torch._dynamo.config.suppress_errors = True
It worked, but the training speed is very slow, please see my last comment. Do you have the problem? @amoghskanda
I have an RTX3090 and neurad takes about 1.5hrs for training. I used neuradest which takes ~13hrs
We basically copied the installation instructions from the latest nerfstudio version, but I see now that there are different torch versions in the dockerfile (2.0.1) and in the readme for a conda env (2.1.2). I think torch dropped some flags for the compile command between 2.0 and 2.1. I think changing the @torch_compile(dynamic=True, mode="reduce-overhead", backend="eager")
to @torch_compile(mode="reduce-overhead")
should work, or switching to torch 2.0.1 (see Dockerfile for specifics). I'll try what gives the best performance on my machine and update the code accordingly.
Thanks for the great work! My CUDA version is 11.4, my torch version is below.
pytorch-msssim 1.0.0 torch 2.1.2+cu118 torch-fidelity 0.3.0 torchmetrics 1.3.2 torchvision 0.16.2+cu118
I firstly run raw nerfstudio pkg successfully using this env, then I uninstall the raw nerfstudio and pip install
neurad-studio
, then I meet error below. Could you please give some advise?PYDEV DEBUGGER WARNING: sys.settrace() should not be used when the debugger is being used. This may cause the debugger to stop working correctly. If this is needed, please check: http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html to see how to restore the debug tracing back correctly. Call Location: File "/home/rongbo.ma/code/neurad-studio/nerfstudio/viewer_legacy/server/viewer_utils.py", line 63, in __enter__ sys.settrace(self.func) Traceback (most recent call last): File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module> cli.main() File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "nerfstudio/scripts/train.py", line 278, in <module> entrypoint() File "nerfstudio/scripts/train.py", line 269, in entrypoint main( File "nerfstudio/scripts/train.py", line 254, in main launch( File "nerfstudio/scripts/train.py", line 196, in launch main_func(local_rank=0, world_size=world_size, config=config) File "nerfstudio/scripts/train.py", line 107, in train_loop trainer.train() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 315, in train loss, loss_dict, metrics_dict = self.train_iteration(step) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner out = func(*args, **kwargs) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 554, in train_iteration _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner out = func(*args, **kwargs) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/pipelines/ad_pipeline.py", line 85, in get_train_loss_dict ray_bundle, batch = self.datamanager.next_train(step) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 294, in next_train ray_bundle, batch = self.next_batch if self.next_batch is not None else self._get_from_queue() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 304, in _get_from_queue ray_bundle, batch = self.data_procs[0].get_batch_and_ray_bundle() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 152, in get_batch_and_ray_bundle lidar_batch, lidar_ray_bundle = self.get_lidar_batch_and_ray_bundle() File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 168, in get_lidar_batch_and_ray_bundle ray_bundle: RayBundle = self.lidar_ray_generator(ray_indices, points=batch["lidar"]) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/model_components/ray_generators.py", line 83, in forward ray_bundle = self.lidars.generate_rays( File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 315, in generate_rays raybundle = lidars._generate_rays_from_points(lidar_indices, points, lidar_opt_to_lidar) File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 372, in _generate_rays_from_points points_world = transform_points_pairwise(points[..., :3], l2w) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn return fn(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors return callback(frame, cache_entry, hooks, frame_state) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame result = inner_convert(frame, cache_size, hooks, frame_state) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn return fn(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert return _compile( File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile guarded_code = compile_inner(code, one_graph, hooks, transform) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper r = func(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 491, in compile_inner out_code = transform_code_object(code, transform) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object transformations(instructions, code_options) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 458, in transform tracer.run() File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2069, in run super().run() File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 719, in run and self.step() File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in step getattr(self, inst.opname)(inst) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2157, in RETURN_VALUE self.output.compile_subgraph( File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 833, in compile_subgraph self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 957, in compile_and_call_fx_graph compiled_fn = self.call_user_compiler(gm) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper r = func(*args, **kwargs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1024, in call_user_compiler raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1009, in call_user_compiler compiled_fn = compiler_fn(gm, self.example_inputs()) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper compiled_gm = compiler_fn(gm, example_inputs) File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/__init__.py", line 1607, in __call__ return self.compiler_fn(model_, inputs_, **self.kwargs) torch._dynamo.exc.BackendCompilerFailed: backend='eager' raised: TypeError: eager() got an unexpected keyword argument 'mode' Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information You can suppress this exception and fall back to eager by setting: import torch._dynamo torch._dynamo.config.suppress_errors = True
import torch._dynamo torch._dynamo.config.suppress_errors = True Append these 2 lines in the
train.py
file after all the importsimport torch._dynamo torch._dynamo.config.suppress_errors = True
It worked, but the training speed is very slow, please see my last comment. Do you have the problem? @amoghskanda
I found the training speed very slow reason.
Without disabling multithreading, there will be always a GPU utilization of 0 and a CPU usage of 1000% on A100 cuda11.4, which leads to extremely slow training speed.
After adding pipeline. datamanager. num_processes=0
in the training cmd, the GPU utilization is no longer 0, and the training speed has improved, but it is still slower than normal (training on the correct environment configuration 3090 takes 1 and a half hours, while training on A100 takes 3 hours)
Also, I try change @torch_compile(dynamic=True, mode="reduce-overhead", backend="eager") to @torch_compile(mode="reduce-overhead") , it still needs pipeline. datamanager. num_processes=0
, otherwise the training speed is extremely slow.
FYI @georghess
Sorry for the slow reply.
The torch_compile
issue should have been solved by 92ee3bb6ebbac1728eb5975566163ac30b0d6af0.
As for the issue of slow training speed on A100s, we haven't been able to reproduce the issue. We are working on looking into the multi-proc. But, for clarity, I'll close this issue for now and ask you to open a separate issues for things not related to the torch_compile error.
Thanks for the great work!
My CUDA version is 11.4, my torch version is below.
I firstly run raw nerfstudio pkg successfully using this env, then I uninstall the raw nerfstudio and pip install
neurad-studio
, then I meet error below. Could you please give some advise?