cvlab-epfl / gecco

Code release for GECCO: Geometrically-Conditioned Point Diffusion Models
Apache License 2.0
11 stars 1 forks source link

Gecco-Torch fails on "Sanity Checking Dataloader" #1

Open grgkopanas opened 1 year ago

grgkopanas commented 1 year ago

Hi,

Trying to run the torch version I get the following error. Have you ever seen that before? It seems like it's not even on the code-base but rather some pytorch-lightning internals.

I installed by creating a conda environment with python >3.10 and then pip install -e ./

I have changed nothing in the code.

$ python shapenet_airplane_unconditional.py

Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name        | Type            | Params
------------------------------------------------
0 | backbone    | EDMPrecond      | 13.5 M
1 | conditioner | IdleConditioner | 0
2 | loss        | EDMLoss         | 0
3 | reparam     | GaussianReparam | 0
------------------------------------------------
13.5 M    Trainable params
0         Non-trainable params
13.5 M    Total params
53.924    Total estimated model params size (MB)
Sanity Checking DataLoader 0:   0%|                                                                                                                                                                                                                               | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/data/graphdeco/user/gkopanas/point_diffusion/gecco/gecco-torch/example_configs/shapenet_airplane_unconditional.py", line 82, in <module>
    trainer().fit(model, data)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 545, in fit
    call._call_and_handle_interrupt(
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 581, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 990, in _run
    results = self._run_stage()
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1034, in _run_stage
    self._run_sanity_check()
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py", line 1063, in _run_sanity_check
    val_loop.run()
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/loops/utilities.py", line 181, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 134, in run
    self._evaluation_step(batch, batch_idx, dataloader_idx, dataloader_iter)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/loops/evaluation_loop.py", line 391, in _evaluation_step
    output = call._call_strategy_hook(trainer, hook_name, *step_args)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 403, in validation_step
    return self.lightning_module.validation_step(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 490, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 641, in _convert_frame
    result = inner_convert(frame, cache_size, hooks, frame_state)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 133, in _fn
    return fn(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 389, in _convert_frame_assert
    return _compile(
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 586, in _compile
    raise InternalTorchDynamoError(str(e)).with_traceback(
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 569, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 189, in time_wrapper
    r = func(*args, **kwargs)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 549, in compile_inner
    check_fn = CheckFunctionManager(
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 929, in __init__
    guard.create(local_builder, global_builder)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_guards.py", line 243, in create
    return self.create_fn(self.source.select(local_builder, global_builder), self)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 404, in CONSTANT_MATCH
    val = self.get(guard.name)
  File "/home/gkopanas/.conda/envs/pvd_gecco/lib/python3.10/site-packages/torch/_dynamo/guards.py", line 234, in get
    return eval(name, self.scope, CLOSURE_VARS)
  File "<string>", line 1, in <module>
torch._dynamo.exc.InternalTorchDynamoError: 'NoneType' object is not subscriptable

You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True
grgkopanas commented 1 year ago

I tried it in a completely different setup, different cluster/hardware and got the same error.

jatentaki commented 1 year ago

I see, can you provide the output of pip freeze (with this env activated)? It could be that you're on a differnet version of pytorch and it doesn't work for some reason. I did my development on 2.0.1. I'll dump my pip freeze on Monday so we can look for differences.

grgkopanas commented 1 year ago

Thank you for suggesting this, torch 2.0.1 helped but it also needed to pip install tensorboard - should be added in the env setup.

jatentaki commented 1 year ago

Does this make the code work overall? I'll try to fix this is an update rather than pinning old dependencies

grgkopanas commented 1 year ago

As far as I can tell the code works fine with 2.0.1