HFAiLab / OpenCastKit

The open-source solutions of FourCastNet and GraphCast
MIT License
297 stars 74 forks source link

运行graphcast遇到的问题 #18

Open sunhuihang opened 1 year ago

sunhuihang commented 1 year ago

首先感谢作者分享了模型,我在学习运行的过程中遇到了一些问题难以解决,希望能获得帮助。我已经能够成功运行FourcastNet进行推理预测,但是运行Graphcast时遇到一个问题,FourcastNet输入为20个通道,Graphcast输入需要最后一维为49,我能知道x0,x1都为22(20个变量加2个时间特征),y[ :, :, -2]为2,加起来为46。请问我遗漏了什么,是否可以指点一下,非常感谢! image

VachelHU commented 1 year ago

还有 球体位置 信息,看 /data_factory/datasets.py 45行

sunhuihang commented 1 year ago

还有 球体位置 信息,看 /data_factory/datasets.py 45行 非常感谢您的回复,加上球体位置信息确实已经是49了,还有个小疑问,球体位置是放到最后(请看截图蓝色框框)吗?非常感谢! image

VachelHU commented 1 year ago

是的

sunhuihang commented 1 year ago

是的

您好,我最近发现一个比较大的问题,您给出训练好的模型graphcast.pt 与 graphcast_sequential.py中定义的模型结构并不相同,导致参数都加载不上。如图,graphcast.pt中的网络参数与graphcast_sequential.py中定义的数量并不相同,graphcast.pt中有58个,而graphcast_sequential.py中定义有70个。 image

sunhuihang commented 1 year ago

是的

您好,我最近发现一个比较大的问题,您给出训练好的模型graphcast.pt 与 graphcast_sequential.py中定义的模型结构并不相同,导致参数都加载不上。如图,graphcast.pt中的网络参数与graphcast_sequential.py中定义的数量并不相同,graphcast.pt中有58个,而graphcast_sequential.py中定义有70个。 image

也就说,仓库中放出的graphcast.pt模型,并不是由给出的train_graphcsat.py训练的是吗?

Aquila96 commented 1 year ago

是的

您好,我最近发现一个比较大的问题,您给出训练好的模型graphcast.pt 与 graphcast_sequential.py中定义的模型结构并不相同,导致参数都加载不上。如图,graphcast.pt中的网络参数与graphcast_sequential.py中定义的数量并不相同,graphcast.pt中有58个,而graphcast_sequential.py中定义有70个。 image

也就说,仓库中放出的graphcast.pt模型,并不是由给出的train_graphcsat.py训练的是吗?

The weights and code certainly does not match, you can remove some of the linear layers, and change some of the dims from 512 to 256 to get it to load. But the results are pretty much noise after the first step. I'm out of ideas, it could be that this version of the model only employed one layer, instead of the prescribed 16-layer GNN blocks.

oubahe commented 1 year ago

是的

您好,我最近发现一个比较大的问题,您给出训练好的模型graphcast.pt 与 graphcast_sequential.py中定义的模型结构并不相同,导致参数都加载不上。如图,graphcast.pt中的网络参数与graphcast_sequential.py中定义的数量并不相同,graphcast.pt中有58个,而graphcast_sequential.py中定义有70个。 image

也就说,仓库中放出的graphcast.pt模型,并不是由给出的train_graphcsat.py训练的是吗?

The weights and code certainly does not match, you can remove some of the linear layers, and change some of the dims from 512 to 256 to get it to load. But the results are pretty much noise after the first step. I'm out of ideas, it could be that this version of the model only employed one layer, instead of the prescribed 16-layer GNN blocks.

Hi, i got this problem too, do you have any solutions to solve it? Hope you don't hesitate to enlighten me.

Aquila96 commented 1 year ago

是的

您好,我最近发现一个比较大的问题,您给出训练好的模型graphcast.pt 与 graphcast_sequential.py中定义的模型结构并不相同,导致参数都加载不上。如图,graphcast.pt中的网络参数与graphcast_sequential.py中定义的数量并不相同,graphcast.pt中有58个,而graphcast_sequential.py中定义有70个。 image

也就说,仓库中放出的graphcast.pt模型,并不是由给出的train_graphcsat.py训练的是吗?

The weights and code certainly does not match, you can remove some of the linear layers, and change some of the dims from 512 to 256 to get it to load. But the results are pretty much noise after the first step. I'm out of ideas, it could be that this version of the model only employed one layer, instead of the prescribed 16-layer GNN blocks.

Hi, i got this problem too, do you have any solutions to solve it? Hope you don't hesitate to enlighten me.

Hi,

Considering the myriad of discrepencies (constant_feature, model depth, absence of autoregressive training, #23, etc), I think it's safe to assume that this version is not the one presented in the blog post.

If you are looking for a detailed implementation, I'd recommend NVIDIA modulus's version.

oubahe commented 1 year ago

是的

您好,我最近发现一个比较大的问题,您给出训练好的模型graphcast.pt 与 graphcast_sequential.py中定义的模型结构并不相同,导致参数都加载不上。如图,graphcast.pt中的网络参数与graphcast_sequential.py中定义的数量并不相同,graphcast.pt中有58个,而graphcast_sequential.py中定义有70个。 image

也就说,仓库中放出的graphcast.pt模型,并不是由给出的train_graphcsat.py训练的是吗?

The weights and code certainly does not match, you can remove some of the linear layers, and change some of the dims from 512 to 256 to get it to load. But the results are pretty much noise after the first step. I'm out of ideas, it could be that this version of the model only employed one layer, instead of the prescribed 16-layer GNN blocks.

Hi, i got this problem too, do you have any solutions to solve it? Hope you don't hesitate to enlighten me.

Hi,

Considering the myriad of discrepencies (constant_feature, model depth, absence of autoregressive training, #23, etc), I think it's safe to assume that this version is not the one presented in the blog post.

If you are looking for a detailed implementation, I'd recommend NVIDIA modulus's version.

yeap, I have a same speculation as you. Let me check out the NVIDIA's version, thanks very much for the answer.