georghess / neurad-studio

[CVPR2024] NeuRAD: Neural Rendering for Autonomous Driving
https://research.zenseact.com/publications/neurad/
Apache License 2.0
329 stars 23 forks source link

TypeError: eager() got an unexpected keyword argument 'mode' #18

Closed blackmrb closed 3 months ago

blackmrb commented 4 months ago

Thanks for the great work!

My CUDA version is 11.4, my torch version is below.

pytorch-msssim            1.0.0
torch                     2.1.2+cu118
torch-fidelity            0.3.0
torchmetrics              1.3.2
torchvision               0.16.2+cu118

I firstly run raw nerfstudio pkg successfully using this env, then I uninstall the raw nerfstudio and pip install neurad-studio, then I meet error below. Could you please give some advise?

PYDEV DEBUGGER WARNING:
sys.settrace() should not be used when the debugger is being used.
This may cause the debugger to stop working correctly.
If this is needed, please check: 
http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
to see how to restore the debug tracing back correctly.
Call Location:
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/viewer_legacy/server/viewer_utils.py", line 63, in __enter__
    sys.settrace(self.func)

Traceback (most recent call last):
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "nerfstudio/scripts/train.py", line 278, in <module>
    entrypoint()
  File "nerfstudio/scripts/train.py", line 269, in entrypoint
    main(
  File "nerfstudio/scripts/train.py", line 254, in main
    launch(
  File "nerfstudio/scripts/train.py", line 196, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "nerfstudio/scripts/train.py", line 107, in train_loop
    trainer.train()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 315, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 554, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/pipelines/ad_pipeline.py", line 85, in get_train_loss_dict
    ray_bundle, batch = self.datamanager.next_train(step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 294, in next_train
    ray_bundle, batch = self.next_batch if self.next_batch is not None else self._get_from_queue()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 304, in _get_from_queue
    ray_bundle, batch = self.data_procs[0].get_batch_and_ray_bundle()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 152, in get_batch_and_ray_bundle
    lidar_batch, lidar_ray_bundle = self.get_lidar_batch_and_ray_bundle()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 168, in get_lidar_batch_and_ray_bundle
    ray_bundle: RayBundle = self.lidar_ray_generator(ray_indices, points=batch["lidar"])
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/model_components/ray_generators.py", line 83, in forward
    ray_bundle = self.lidars.generate_rays(
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 315, in generate_rays
    raybundle = lidars._generate_rays_from_points(lidar_indices, points, lidar_opt_to_lidar)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 372, in _generate_rays_from_points
    points_world = transform_points_pairwise(points[..., :3], l2w)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame
    result = inner_convert(frame, cache_size, hooks, frame_state)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn
    return fn(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert
    return _compile(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 491, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object
    transformations(instructions, code_options)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 458, in transform
    tracer.run()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2069, in run
    super().run()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 719, in run
    and self.step()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in step
    getattr(self, inst.opname)(inst)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2157, in RETURN_VALUE
    self.output.compile_subgraph(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 833, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 957, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1024, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1009, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/__init__.py", line 1607, in __call__
    return self.compiler_fn(model_, inputs_, **self.kwargs)
torch._dynamo.exc.BackendCompilerFailed: backend='eager' raised:
TypeError: eager() got an unexpected keyword argument 'mode'

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True
atonderski commented 4 months ago

Hi! Unfortunately I have not seen this error. Did you try the suggested fix at the bottom of the error message? Also, is it perhaps related to https://github.com/nerfstudio-project/nerfstudio/issues/2615 ?

blackmrb commented 4 months ago

Hi! Unfortunately I have not seen this error. Did you try the suggested fix at the bottom of the error message? Also, is it perhaps related to nerfstudio-project/nerfstudio#2615 ?

Thanks for the replay~ I have refered https://github.com/nerfstudio-project/nerfstudio/issues/2615, it can run successly, but the default cmd python nerfstudio/scripts/train.py neurad pandaset-data is very slow. I think this may have something to do with disabling PyTorch's acceleration.

image

Now I am trying the Dockerfile.

amoghskanda commented 4 months ago

Thanks for the great work!

My CUDA version is 11.4, my torch version is below.

pytorch-msssim            1.0.0
torch                     2.1.2+cu118
torch-fidelity            0.3.0
torchmetrics              1.3.2
torchvision               0.16.2+cu118

I firstly run raw nerfstudio pkg successfully using this env, then I uninstall the raw nerfstudio and pip install neurad-studio, then I meet error below. Could you please give some advise?

PYDEV DEBUGGER WARNING:
sys.settrace() should not be used when the debugger is being used.
This may cause the debugger to stop working correctly.
If this is needed, please check: 
http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
to see how to restore the debug tracing back correctly.
Call Location:
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/viewer_legacy/server/viewer_utils.py", line 63, in __enter__
    sys.settrace(self.func)

Traceback (most recent call last):
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "nerfstudio/scripts/train.py", line 278, in <module>
    entrypoint()
  File "nerfstudio/scripts/train.py", line 269, in entrypoint
    main(
  File "nerfstudio/scripts/train.py", line 254, in main
    launch(
  File "nerfstudio/scripts/train.py", line 196, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "nerfstudio/scripts/train.py", line 107, in train_loop
    trainer.train()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 315, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 554, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/pipelines/ad_pipeline.py", line 85, in get_train_loss_dict
    ray_bundle, batch = self.datamanager.next_train(step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 294, in next_train
    ray_bundle, batch = self.next_batch if self.next_batch is not None else self._get_from_queue()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 304, in _get_from_queue
    ray_bundle, batch = self.data_procs[0].get_batch_and_ray_bundle()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 152, in get_batch_and_ray_bundle
    lidar_batch, lidar_ray_bundle = self.get_lidar_batch_and_ray_bundle()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 168, in get_lidar_batch_and_ray_bundle
    ray_bundle: RayBundle = self.lidar_ray_generator(ray_indices, points=batch["lidar"])
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/model_components/ray_generators.py", line 83, in forward
    ray_bundle = self.lidars.generate_rays(
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 315, in generate_rays
    raybundle = lidars._generate_rays_from_points(lidar_indices, points, lidar_opt_to_lidar)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 372, in _generate_rays_from_points
    points_world = transform_points_pairwise(points[..., :3], l2w)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame
    result = inner_convert(frame, cache_size, hooks, frame_state)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn
    return fn(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert
    return _compile(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 491, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object
    transformations(instructions, code_options)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 458, in transform
    tracer.run()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2069, in run
    super().run()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 719, in run
    and self.step()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in step
    getattr(self, inst.opname)(inst)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2157, in RETURN_VALUE
    self.output.compile_subgraph(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 833, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 957, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1024, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1009, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/__init__.py", line 1607, in __call__
    return self.compiler_fn(model_, inputs_, **self.kwargs)
torch._dynamo.exc.BackendCompilerFailed: backend='eager' raised:
TypeError: eager() got an unexpected keyword argument 'mode'

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

import torch._dynamo torch._dynamo.config.suppress_errors = True

Append these 2 lines in the train.py file after all the imports

blackmrb commented 4 months ago

Thanks for the great work! My CUDA version is 11.4, my torch version is below.

pytorch-msssim            1.0.0
torch                     2.1.2+cu118
torch-fidelity            0.3.0
torchmetrics              1.3.2
torchvision               0.16.2+cu118

I firstly run raw nerfstudio pkg successfully using this env, then I uninstall the raw nerfstudio and pip install neurad-studio, then I meet error below. Could you please give some advise?

PYDEV DEBUGGER WARNING:
sys.settrace() should not be used when the debugger is being used.
This may cause the debugger to stop working correctly.
If this is needed, please check: 
http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
to see how to restore the debug tracing back correctly.
Call Location:
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/viewer_legacy/server/viewer_utils.py", line 63, in __enter__
    sys.settrace(self.func)

Traceback (most recent call last):
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "nerfstudio/scripts/train.py", line 278, in <module>
    entrypoint()
  File "nerfstudio/scripts/train.py", line 269, in entrypoint
    main(
  File "nerfstudio/scripts/train.py", line 254, in main
    launch(
  File "nerfstudio/scripts/train.py", line 196, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "nerfstudio/scripts/train.py", line 107, in train_loop
    trainer.train()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 315, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 554, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/pipelines/ad_pipeline.py", line 85, in get_train_loss_dict
    ray_bundle, batch = self.datamanager.next_train(step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 294, in next_train
    ray_bundle, batch = self.next_batch if self.next_batch is not None else self._get_from_queue()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 304, in _get_from_queue
    ray_bundle, batch = self.data_procs[0].get_batch_and_ray_bundle()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 152, in get_batch_and_ray_bundle
    lidar_batch, lidar_ray_bundle = self.get_lidar_batch_and_ray_bundle()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 168, in get_lidar_batch_and_ray_bundle
    ray_bundle: RayBundle = self.lidar_ray_generator(ray_indices, points=batch["lidar"])
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/model_components/ray_generators.py", line 83, in forward
    ray_bundle = self.lidars.generate_rays(
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 315, in generate_rays
    raybundle = lidars._generate_rays_from_points(lidar_indices, points, lidar_opt_to_lidar)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 372, in _generate_rays_from_points
    points_world = transform_points_pairwise(points[..., :3], l2w)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame
    result = inner_convert(frame, cache_size, hooks, frame_state)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn
    return fn(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert
    return _compile(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 491, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object
    transformations(instructions, code_options)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 458, in transform
    tracer.run()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2069, in run
    super().run()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 719, in run
    and self.step()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in step
    getattr(self, inst.opname)(inst)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2157, in RETURN_VALUE
    self.output.compile_subgraph(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 833, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 957, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1024, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1009, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/__init__.py", line 1607, in __call__
    return self.compiler_fn(model_, inputs_, **self.kwargs)
torch._dynamo.exc.BackendCompilerFailed: backend='eager' raised:
TypeError: eager() got an unexpected keyword argument 'mode'

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

import torch._dynamo torch._dynamo.config.suppress_errors = True

Append these 2 lines in the train.py file after all the imports

import torch._dynamo
torch._dynamo.config.suppress_errors = True

It worked, but the training speed is very slow, please see my last comment. Do you have the problem? @amoghskanda

amoghskanda commented 4 months ago

I have an RTX3090 and neurad takes about 1.5hrs for training. I used neuradest which takes ~13hrs

georghess commented 4 months ago

We basically copied the installation instructions from the latest nerfstudio version, but I see now that there are different torch versions in the dockerfile (2.0.1) and in the readme for a conda env (2.1.2). I think torch dropped some flags for the compile command between 2.0 and 2.1. I think changing the @torch_compile(dynamic=True, mode="reduce-overhead", backend="eager") to @torch_compile(mode="reduce-overhead") should work, or switching to torch 2.0.1 (see Dockerfile for specifics). I'll try what gives the best performance on my machine and update the code accordingly.

blackmrb commented 4 months ago

Thanks for the great work! My CUDA version is 11.4, my torch version is below.

pytorch-msssim            1.0.0
torch                     2.1.2+cu118
torch-fidelity            0.3.0
torchmetrics              1.3.2
torchvision               0.16.2+cu118

I firstly run raw nerfstudio pkg successfully using this env, then I uninstall the raw nerfstudio and pip install neurad-studio, then I meet error below. Could you please give some advise?

PYDEV DEBUGGER WARNING:
sys.settrace() should not be used when the debugger is being used.
This may cause the debugger to stop working correctly.
If this is needed, please check: 
http://pydev.blogspot.com/2007/06/why-cant-pydev-debugger-work-with.html
to see how to restore the debug tracing back correctly.
Call Location:
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/viewer_legacy/server/viewer_utils.py", line 63, in __enter__
    sys.settrace(self.func)

Traceback (most recent call last):
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/rongbo.ma/.vscode-server/extensions/ms-python.debugpy-2024.6.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "nerfstudio/scripts/train.py", line 278, in <module>
    entrypoint()
  File "nerfstudio/scripts/train.py", line 269, in entrypoint
    main(
  File "nerfstudio/scripts/train.py", line 254, in main
    launch(
  File "nerfstudio/scripts/train.py", line 196, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "nerfstudio/scripts/train.py", line 107, in train_loop
    trainer.train()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 315, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/engine/trainer.py", line 554, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/pipelines/ad_pipeline.py", line 85, in get_train_loss_dict
    ray_bundle, batch = self.datamanager.next_train(step)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 294, in next_train
    ray_bundle, batch = self.next_batch if self.next_batch is not None else self._get_from_queue()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 304, in _get_from_queue
    ray_bundle, batch = self.data_procs[0].get_batch_and_ray_bundle()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 152, in get_batch_and_ray_bundle
    lidar_batch, lidar_ray_bundle = self.get_lidar_batch_and_ray_bundle()
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/data/datamanagers/image_lidar_datamanager.py", line 168, in get_lidar_batch_and_ray_bundle
    ray_bundle: RayBundle = self.lidar_ray_generator(ray_indices, points=batch["lidar"])
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/model_components/ray_generators.py", line 83, in forward
    ray_bundle = self.lidars.generate_rays(
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 315, in generate_rays
    raybundle = lidars._generate_rays_from_points(lidar_indices, points, lidar_opt_to_lidar)
  File "/home/rongbo.ma/code/neurad-studio/nerfstudio/cameras/lidars.py", line 372, in _generate_rays_from_points
    points_world = transform_points_pairwise(points[..., :3], l2w)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame
    result = inner_convert(frame, cache_size, hooks, frame_state)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn
    return fn(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert
    return _compile(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 491, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 1028, in transform_code_object
    transformations(instructions, code_options)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 458, in transform
    tracer.run()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2069, in run
    super().run()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 719, in run
    and self.step()
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in step
    getattr(self, inst.opname)(inst)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2157, in RETURN_VALUE
    self.output.compile_subgraph(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 833, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 957, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1024, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1009, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/home/rongbo.ma/anaconda3/envs/nerfstudio/lib/python3.8/site-packages/torch/__init__.py", line 1607, in __call__
    return self.compiler_fn(model_, inputs_, **self.kwargs)
torch._dynamo.exc.BackendCompilerFailed: backend='eager' raised:
TypeError: eager() got an unexpected keyword argument 'mode'

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

import torch._dynamo torch._dynamo.config.suppress_errors = True Append these 2 lines in the train.py file after all the imports

import torch._dynamo
torch._dynamo.config.suppress_errors = True

It worked, but the training speed is very slow, please see my last comment. Do you have the problem? @amoghskanda

I found the training speed very slow reason. Without disabling multithreading, there will be always a GPU utilization of 0 and a CPU usage of 1000% on A100 cuda11.4, which leads to extremely slow training speed. After adding pipeline. datamanager. num_processes=0 in the training cmd, the GPU utilization is no longer 0, and the training speed has improved, but it is still slower than normal (training on the correct environment configuration 3090 takes 1 and a half hours, while training on A100 takes 3 hours)

Also, I try change @torch_compile(dynamic=True, mode="reduce-overhead", backend="eager") to @torch_compile(mode="reduce-overhead") , it still needs pipeline. datamanager. num_processes=0 , otherwise the training speed is extremely slow.

FYI @georghess

georghess commented 3 months ago

Sorry for the slow reply.

The torch_compile issue should have been solved by 92ee3bb6ebbac1728eb5975566163ac30b0d6af0.

As for the issue of slow training speed on A100s, we haven't been able to reproduce the issue. We are working on looking into the multi-proc. But, for clarity, I'll close this issue for now and ask you to open a separate issues for things not related to the torch_compile error.