amazon-science / earth-forecasting-transformer

Official implementation of Earthformer
Apache License 2.0
359 stars 61 forks source link

关于MovingMINST的实验设置 #52

Open water-wbq opened 1 year ago

water-wbq commented 1 year ago

您好,非常感谢您出色的工作! 我目前有2个问题:

问题1: 我发现论文中MovingMINST的实验设置和视频预测系列文章中的设置有些不同。 论文中,是总数是10000,其中train是8100,val是900,test是1000,都来自mnist_test_seq.npy。 在许多其他文章中(比如baselines中的PredRNN、PhyDNet),是训练10000,测试10000,训练来自train-images-idx3-ubyte.gz,测试来自mnist_test_seq.npy。 请问论文中这样的不同设置是有什么原因吗?

问题2: 我只有1个gpu,想rerun一下Earthformer,该如何设置呢?如果按照原始的代码,会遇到ApexDDPStrategy报错的问题。

谢谢!

gaozhihan commented 1 year ago

Question 1

We found that the experimental settings of the MovingMNIST benchmark are not standardized. Many methods use an infinite training set (generated on the fly). To establish a unified evaluation standard, we chose to use the publicly available data with the highest utilization/consensus: mnist_test_seq.npy.

Question 2

Taking MovingMNIST as an example, please remove the settings related to multi-GPU communication from the training command:

python train_cuboid_mnist.py --cfg cfg.yaml --ckpt_name last.ckpt --save tmp_mnist

If there are still issues, it could be caused by a version mismatch of pytorch_lightning. Please make sure that the version of pytorch_lightning is 1.6.4 specified in README:

pip uninstall pytorch_lightning
pip install pytorch_lightning==1.6.4

问题1

我们发现MovingMNIST benchmark的实验设置标准并不统一, 很多方法采用了无限训练集(generated on the fly). 为统一衡量标准, 我们选择了使用率/认同度最高的公开数据mnist_test_seq.npy.

问题2

以MovingMNIST为例, 请将训练命令中有关多GPU通讯的设置删除:

python train_cuboid_mnist.py --cfg cfg.yaml --ckpt_name last.ckpt --save tmp_mnist

如果仍然有问题, 可能是由pytorch_lightning的版本不匹配造成的. 请确认pytorch_lightningREADME中指定的1.6.4版本:

pip uninstall pytorch_lightning
pip install pytorch_lightning==1.6.4