ValueError: SyncBatchNorm layers only work with GPU modules

Looks like the GPU in colab is not being engaged. Tried using A100, V100, T4 GPU, and TPU hardware settings in colab. command: python train.py spacetimeformer mnist --embed_method spatio-temporal --local_self_attn full --local_cross_attn full --global_self_attn full --global_cross_attn full --run_name mnist_spatiotemporal --context_points 10

Error trace: 2023-12-30 20:47:30.093968: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-12-30 20:47:30.094027: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-12-30 20:47:30.095405: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-12-30 20:47:31.265649: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Using default wandb log dir path of ./data/STF_LOG_DIR. This can be adjusted with the environment variableSTF_LOG_DIR` Forecaster L2: 1e-06 Linear Window: 0 Linear Shared Weights: False RevIN: False Decomposition: False GlobalSelfAttn: AttentionLayer( (inner_attention): FullAttention( (dropout): Dropout(p=0.0, inplace=False) ) (query_projection): Linear(in_features=200, out_features=800, bias=True) (key_projection): Linear(in_features=200, out_features=800, bias=True) (value_projection): Linear(in_features=200, out_features=800, bias=True) (out_projection): Linear(in_features=800, out_features=200, bias=True) (dropout_qkv): Dropout(p=0.0, inplace=False) ) GlobalCrossAttn: AttentionLayer( (inner_attention): FullAttention( (dropout): Dropout(p=0.0, inplace=False) ) (query_projection): Linear(in_features=200, out_features=800, bias=True) (key_projection): Linear(in_features=200, out_features=800, bias=True) (value_projection): Linear(in_features=200, out_features=800, bias=True) (out_projection): Linear(in_features=800, out_features=200, bias=True) (dropout_qkv): Dropout(p=0.0, inplace=False) ) LocalSelfAttn: AttentionLayer( (inner_attention): FullAttention( (dropout): Dropout(p=0.0, inplace=False) ) (query_projection): Linear(in_features=200, out_features=800, bias=True) (key_projection): Linear(in_features=200, out_features=800, bias=True) (value_projection): Linear(in_features=200, out_features=800, bias=True) (out_projection): Linear(in_features=800, out_features=200, bias=True) (dropout_qkv): Dropout(p=0.0, inplace=False) ) LocalCrossAttn: AttentionLayer( (inner_attention): FullAttention( (dropout): Dropout(p=0.0, inplace=False) ) (query_projection): Linear(in_features=200, out_features=800, bias=True) (key_projection): Linear(in_features=200, out_features=800, bias=True) (value_projection): Linear(in_features=200, out_features=800, bias=True) (out_projection): Linear(in_features=800, out_features=200, bias=True) (dropout_qkv): Dropout(p=0.0, inplace=False) ) Using Embedding: spatio-temporal Time Emb Dim: 6 Space Embedding: True Time Embedding: True Val Embedding: True Given Embedding: True Null Value: None Pad Value: None Reconstruction Dropout: Timesteps 0.05, Standard 0.1, Seq (max len = 5) 0.2, Skip All Drop 1.0 Spacetimeformer (v1.5) Summary: Model Dim: 200 FF Dim: 800 Enc Layers: 3 Dec Layers: 3 Embed Dropout: 0.2 FF Dropout: 0.3 Attn Out Dropout: 0.0 Attn Matrix Dropout: 0.0 QKV Dropout: 0.0 L2 Coeff: 1e-06 Warmup Steps: 0 Normalization Scheme: batch Attention Time Windows: 1 Shifted Time Windows: False Position Emb Type: abs Recon Loss Imp: 0.0

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./images/MNIST/raw/train-images-idx3-ubyte.gz 100% 9912422/9912422 [00:00<00:00, 199942825.48it/s] Extracting ./images/MNIST/raw/train-images-idx3-ubyte.gz to ./images/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./images/MNIST/raw/train-labels-idx1-ubyte.gz 100% 28881/28881 [00:00<00:00, 149735097.43it/s] Extracting ./images/MNIST/raw/train-labels-idx1-ubyte.gz to ./images/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./images/MNIST/raw/t10k-images-idx3-ubyte.gz 100% 1648877/1648877 [00:00<00:00, 43603948.10it/s] Extracting ./images/MNIST/raw/t10k-images-idx3-ubyte.gz to ./images/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./images/MNIST/raw/t10k-labels-idx1-ubyte.gz 100% 4542/4542 [00:00<00:00, 32234397.24it/s] Extracting ./images/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./images/MNIST/raw

/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:287: LightningDeprecationWarning: Passing `Trainer(accelerator='dp')` has been deprecated in v1.5 and will be removed in v1.7. Use `Trainer(strategy='dp')` instead. rank_zero_deprecation( /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/accelerator_connector.py:597: UserWarning: 'dp' is not supported on CPUs, hence setting `strategy='ddp'`. rank_zero_warn(f"{strategy_flag!r} is not supported on CPUs, hence setting `strategy='ddp'`.") /usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/utilities.py:91: PossibleUserWarning: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`. rank_zero_warn( GPU available: True, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:1823: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`. rank_zero_warn( `Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used.. `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch.. Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1

distributed_backend=gloo All distributed processes registered. Starting with 1 processes

Traceback (most recent call last): File "/content/spacetimeformer/spacetimeformer/train.py", line 869, in main(args) File "/content/spacetimeformer/spacetimeformer/train.py", line 849, in main trainer.fit(forecaster, datamodule=data_module) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 771, in fit self._call_and_handle_interrupt( File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 722, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, *kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 812, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1218, in _run self.strategy.setup(self) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/ddp.py", line 172, in setup self.configure_ddp() File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/ddp.py", line 294, in configure_ddp self.model = self._setup_model(LightningDistributedModule(self.model)) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/ddp.py", line 178, in _setup_model return DistributedDataParallel(module=model, device_ids=device_ids, self._ddp_kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 809, in init self._ddp_init_helper( File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1140, in _ddp_init_helper self._passing_sync_batchnorm_handle(self.module) File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 2072, in _passing_sync_batchnorm_handle self._log_and_throw( File "/usr/local/lib/python3.10/dist-packages/torch/nn/parallel/distributed.py", line 1037, in _log_and_throw raise err_type(err_msg) ValueError: SyncBatchNorm layers only work with GPU modules`

QData / spacetimeformer

ValueError: SyncBatchNorm layers only work with GPU modules #90

distributed_backend=gloo All distributed processes registered. Starting with 1 processes