amazon-science / earth-forecasting-transformer

Official implementation of Earthformer
Apache License 2.0
359 stars 61 forks source link

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. #59

Open Happiyin opened 11 months ago

Happiyin commented 11 months ago

Traceback (most recent call last): File "/root/autodl-tmp/earth-forecasting-transformer/scripts/cuboid_transformer/sevir/train_cuboid_sevir.py", line 792, in main() File "/root/autodl-tmp/earth-forecasting-transformer/scripts/cuboid_transformer/sevir/train_cuboid_sevir.py", line 781, in main trainer.fit(model=pl_module, File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit self._call_and_handle_interrupt( File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch return function(*args, *kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run results = self._run_stage() File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage return self._run_train() File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train self._run_sanity_check() File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check val_loop.run() File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(args, kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 154, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(args, kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 128, in advance output = self._evaluation_step(kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 226, in _evaluation_step output = self.trainer._call_strategy_hook("validation_step", kwargs.values()) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook output = fn(*args, kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/strategies/ddp.py", line 355, in validation_step return self.model(*args, *kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/apex/parallel/distributed.py", line 564, in forward result = self.module(*inputs, kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/pytorch_lightning/overrides/base.py", line 93, in forward return self.module.validation_step(inputs, kwargs) File "/root/autodl-tmp/earth-forecasting-transformer/scripts/cuboid_transformer/sevir/train_cuboid_sevir.py", line 568, in validation_step step_mse = self.valid_mse(y_hat, y) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/torchmetrics/metric.py", line 298, in forward self._forward_cache = self._forward_reduce_state_update(*args, *kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/torchmetrics/metric.py", line 367, in _forward_reduce_state_update self.update(args, kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/torchmetrics/metric.py", line 467, in wrapped_func raise err File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/torchmetrics/metric.py", line 457, in wrapped_func update(*args, **kwargs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/torchmetrics/regression/mse.py", line 101, in update sum_squared_error, num_obs = _mean_squared_error_update(preds, target, num_outputs=self.num_outputs) File "/root/miniconda3/envs/earthformer/lib/python3.9/site-packages/torchmetrics/functional/regression/mse.py", line 36, in _mean_squared_error_update target = target.view(-1) RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

gaozhihan commented 11 months ago

Thank you for reporting this error. It appears that the error may be related to the usage of the training script train_cuboid_sevir.py. However, I was unable to reproduce the error on my end. The potential cause could be a discrepancy in package versions. Could you please provide the command which led to the error? Additionally, could you please provide the version information of the packages you have installed, including torch, pytorch_lightning, torchmetrics?

fizzking commented 10 months ago

I also encountered this problem, here is my version number: 1dd978402634d2f9a87e52c42c2d9a0

5508c062d75a3ca7d5028fbbbb08ec0

gaozhihan commented 9 months ago

Could you please try torchmetrics==0.11.1? It works correctly with torch==1.12.1+cu116 and pytorch_lightning==1.6.4 on my end.

fizzking commented 9 months ago

I replaced it with the same version as you said:torchmetrics==0.11.1,torch==1.12.1+cu116 and pytorch_lightning==1.6.4 , and the code ran to 4% and stopped to report an error: bbb8f0976b4cee69d340a9d804de714 5460812f2d71ecc3aa1c120b8650e0a

gaozhihan commented 9 months ago

This seems to be related to matplotlib.