PaddlePaddle / PaddleScience

PaddleScience is SDK and library for developing AI-driven scientific computing applications based on PaddlePaddle.
http://paddlescience-docs.rtfd.io/
Apache License 2.0
244 stars 132 forks source link

add Extformer-MoE example by HKUST(GZ) #933

Open KennyNH opened 1 week ago

KennyNH commented 1 week ago

We add a new example named Extformer-MoE. All the related codes are committed to this "dev_model" branch. Attached is a Markdown doc which includes some detailed descriptions. 开发文档.md

paddle-bot[bot] commented 1 week ago

Thanks for your contribution!

CLAassistant commented 1 week ago

CLA assistant check
All committers have signed the CLA.

HydrogenSulfate commented 5 days ago

@KennyNH 可以在PaddleScience目录下执行 pre-commit run --all-files,否则code-style-check会失败

KennyNH commented 4 days ago

你好,已经重新 request a review

HydrogenSulfate commented 4 days ago

你好,已经重新 request a review

code-style-check还是挂了,可以确认一下pre-commit在commit时正常启用了, image

以及,可以点击code-style-check的Details,查看具体未通过pre-commit的文件,

KennyNH commented 2 days ago

已重新提交

HydrogenSulfate commented 2 days ago

已重新提交

好像还是挂了,可以确认一下git commit的时候,pre-commit插件被正确启用,并且所有检查都没有出现失败的情况 image

KennyNH commented 1 day ago

现在好像可以了

KennyNH commented 1 day ago

请问 PaddleScience-Linux-CI 这个是什么问题?

HydrogenSulfate commented 7 hours ago

请问 PaddleScience-Linux-CI 这个是什么问题?

CI问题已经解决,但是我本地测试的时候发现运行会报错,是需要安装xarray的某一个特定后端吗? 已按照文档修改 FILE_PATH,并且打开的.nc文件也是存在的

Error executing job with overrides: []
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/file_manager.py", line 211, in _acquire_with_cache_info
    file = self._cache[self._key]
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/lru_cache.py", line 56, in __getitem__
    value = self._cache[key]
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/enso_round1_train_20210201/CMIP_train.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'fca25123-e359-4418-a25c-5a9da11389f8']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/extformer_moe_enso_train.py", line 195, in main
    train(cfg)
  File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/extformer_moe_enso_train.py", line 50, in train
    sup_constraint = ppsci.constraint.SupervisedConstraint(
  File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/constraint/supervised_constraint.py", line 64, in __init__
    _dataset = dataset.build_dataset(dataloader_cfg["dataset"])
  File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/data/dataset/__init__.py", line 100, in build_dataset
    dataset = eval(dataset_cls)(**cfg)
  File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/data/dataset/ext_moe_enso_dataset.py", line 292, in __init__
    cmip6sst, cmip5sst, cmip6nino, cmip5nino = read_raw_data(self.data_dir)
  File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/data/dataset/ext_moe_enso_dataset.py", line 142, in read_raw_data
    train_cmip = xr.open_dataset(Path(ds_dir) / "CMIP_train.nc").transpose(
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/api.py", line 573, in open_dataset
    backend_ds = backend.open_dataset(
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4_.py", line 646, in open_dataset
    store = NetCDF4DataStore.open(
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4_.py", line 409, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4_.py", line 356, in __init__
    self.format = self.ds.data_model
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4_.py", line 418, in ds
    return self._acquire()
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4_.py", line 412, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
  File "/usr/lib/python3.10/contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/file_manager.py", line 199, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
  File "/usr/local/lib/python3.10/dist-packages/xarray/backends/file_manager.py", line 217, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "src/netCDF4/_netCDF4.pyx", line 2470, in netCDF4._netCDF4.Dataset.__init__
  File "src/netCDF4/_netCDF4.pyx", line 2107, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: '/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/enso_round1_train_20210201/CMIP_train.nc'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
KennyNH commented 7 hours ago

enso 数据集路径和 cfg 文件中路径相符吗?如果不是这个问题我稍后 check一下

获取Outlook for Androidhttps://aka.ms/AAb9ysg


From: HydrogenSulfate @.> Sent: Wednesday, July 3, 2024 11:50:44 AM To: PaddlePaddle/PaddleScience @.> Cc: Hang NI @.>; Mention @.> Subject: Re: [PaddlePaddle/PaddleScience] add Extformer-MoE example by HKUST(GZ) (PR #933)

请问 PaddleScience-Linux-CI 这个是什么问题?

CI问题已经解决,但是我本地测试的时候发现运行会报错,是需要安装xarray的某一个特定后端吗?

Error executing job with overrides: [] Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xarray/backends/file_manager.py", line 211, in _acquire_with_cache_info file = self._cache[self._key] File "/usr/local/lib/python3.10/dist-packages/xarray/backends/lru_cache.py", line 56, in getitem value = self._cache[key] KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/enso_round1_train_20210201/CMIP_train.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'fca25123-e359-4418-a25c-5a9da11389f8']

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/extformer_moe_enso_train.py", line 195, in main train(cfg) File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/extformer_moe_enso_train.py", line 50, in train sup_constraint = ppsci.constraint.SupervisedConstraint( File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/constraint/supervised_constraint.py", line 64, in init _dataset = dataset.build_dataset(dataloader_cfg["dataset"]) File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/data/dataset/init.py", line 100, in build_dataset dataset = eval(dataset_cls)(*cfg) File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/data/dataset/ext_moe_enso_dataset.py", line 292, in init cmip6sst, cmip5sst, cmip6nino, cmip5nino = read_raw_data(self.data_dir) File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/data/dataset/ext_moe_enso_dataset.py", line 142, in read_raw_data train_cmip = xr.open_dataset(Path(ds_dir) / "CMIP_train.nc").transpose( File "/usr/local/lib/python3.10/dist-packages/xarray/backends/api.py", line 573, in open_dataset backend_ds = backend.opendataset( File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4.py", line 646, in opendataset store = NetCDF4DataStore.open( File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4.py", line 409, in open return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4_.py", line 356, in init self.format = self.ds.datamodel File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4.py", line 418, in ds return self.acquire() File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4.py", line 412, in _acquire with self._manager.acquire_context(needs_lock) as root: File "/usr/lib/python3.10/contextlib.py", line 135, in enter return next(self.gen) File "/usr/local/lib/python3.10/dist-packages/xarray/backends/file_manager.py", line 199, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File "/usr/local/lib/python3.10/dist-packages/xarray/backends/file_manager.py", line 217, in _acquire_with_cache_info file = self._opener(self._args, **kwargs) File "src/netCDF4/_netCDF4.pyx", line 2470, in netCDF4._netCDF4.Dataset.init File "src/netCDF4/_netCDF4.pyx", line 2107, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: '/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/enso_round1_train_20210201/CMIP_train.nc'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

― Reply to this email directly, view it on GitHubhttps://github.com/PaddlePaddle/PaddleScience/pull/933#issuecomment-2205035046, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APBKFG6DAT2KT2BACNQ7NOTZKNYJJAVCNFSM6AAAAABJ5IYE6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBVGAZTKMBUGY. You are receiving this because you were mentioned.Message ID: @.***>

HydrogenSulfate commented 6 hours ago

enso 数据集路径和 cfg 文件中路径相符吗?如果不是这个问题我稍后 check一下 获取Outlook for Androidhttps://aka.ms/AAb9ysg ____ From: HydrogenSulfate @.> Sent: Wednesday, July 3, 2024 11:50:44 AM To: PaddlePaddle/PaddleScience @.> Cc: Hang NI @.>; Mention @.> Subject: Re: [PaddlePaddle/PaddleScience] add Extformer-MoE example by HKUST(GZ) (PR #933) 请问 PaddleScience-Linux-CI 这个是什么问题? CI问题已经解决,但是我本地测试的时候发现运行会报错,是需要安装xarray的某一个特定后端吗? Error executing job with overrides: [] Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/xarray/backends/file_manager.py", line 211, in _acquire_with_cache_info file = self._cache[self._key] File "/usr/local/lib/python3.10/dist-packages/xarray/backends/lru_cache.py", line 56, in getitem value = self._cache[key] KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/enso_round1_train_20210201/CMIP_train.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False)), 'fca25123-e359-4418-a25c-5a9da11389f8'] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/extformer_moe_enso_train.py", line 195, in main train(cfg) File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/extformer_moe_enso_train.py", line 50, in train sup_constraint = ppsci.constraint.SupervisedConstraint( File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/constraint/supervised_constraint.py", line 64, in init _dataset = dataset.build_dataset(dataloader_cfg["dataset"]) File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/data/dataset/init.py", line 100, in build_dataset dataset = eval(dataset_cls)(cfg) File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/data/dataset/ext_moe_enso_dataset.py", line 292, in init cmip6sst, cmip5sst, cmip6nino, cmip5nino = read_raw_data(self.data_dir) File "/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/ppsci/data/dataset/ext_moe_enso_dataset.py", line 142, in read_raw_data train_cmip = xr.open_dataset(Path(ds_dir) / "CMIP_train.nc").transpose( File "/usr/local/lib/python3.10/dist-packages/xarray/backends/api.py", line 573, in open_dataset backend_ds = backend.opendataset( File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4.py", line 646, in opendataset store = NetCDF4DataStore.open( File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4.py", line 409, in open return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose) File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4_.py", line 356, in init self.format = self.ds.datamodel File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4.py", line 418, in ds return self.acquire() File "/usr/local/lib/python3.10/dist-packages/xarray/backends/netCDF4.py", line 412, in _acquire with self._manager.acquire_context(needs_lock) as root: File "/usr/lib/python3.10/contextlib.py", line 135, in enter return next(self.gen) File "/usr/local/lib/python3.10/dist-packages/xarray/backends/file_manager.py", line 199, in acquire_context file, cached = self._acquire_with_cache_info(needs_lock) File "/usr/local/lib/python3.10/dist-packages/xarray/backends/file_manager.py", line 217, in _acquire_with_cache_info file = self._opener(*self._args, *kwargs) File "src/netCDF4/_netCDF4.pyx", line 2470, in netCDF4._netCDF4.Dataset.init File "src/netCDF4/_netCDF4.pyx", line 2107, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: '/ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/enso_round1_train_20210201/CMIP_train.nc' Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. ― Reply to this email directly, view it on GitHub<#933 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APBKFG6DAT2KT2BACNQ7NOTZKNYJJAVCNFSM6AAAAABJ5IYE6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBVGAZTKMBUGY. You are receiving this because you were mentioned.Message ID: @.>

是的,路径是:FILE_PATH: /ssd2/sjx/sjx_cuda11.8_py310/PaddleScience_test/PaddleScience/examples/extformer_moe/enso_round1_train_20210201