NVlabs / FourCastNet

Initial public release of code, data, and model weights for FourCastNet
Other
529 stars 131 forks source link

Pre-processing parallel_copy.py #8

Open almaghrabima opened 1 year ago

almaghrabima commented 1 year ago

Thank you for your great code, this is SOTA model. I had an issue with running the pre-processing parallel_copy.py or MPI.py (similar to parallel_copy.py but it's has different number of years) by running the exact datasets for full year(2016-2021) and still got the error which is: KeyError "ValueError: h5py was built without MPI support, can't use mpio driver"

I installed OpenMPI, mpi4py

(cast) mg@amru-System-Product-Name:~$ mpiexec -n 5 python -m mpi4py.bench helloworld
Hello, World! I am process 0 of 5 on amru-System-Product-Name.
Hello, World! I am process 1 of 5 on amru-System-Product-Name.
Hello, World! I am process 2 of 5 on amru-System-Product-Name.
Hello, World! I am process 3 of 5 on amru-System-Product-Name.
Hello, World! I am process 4 of 5 on amru-System-Product-Name.

I don't know what causes this problem because in my point of view everything must be ok with the code and datasets.

(cast) mg@amru-System-Product-Name:~/Documents/Data$ mpirun -n 4 python MPI.py 
{2016: 'j', 2017: 'j', 2018: 'k', 2019: 'k', 2020: 'a', 2021: 'a'}
2016
{2016: 'j', 2017: 'j', 2018: 'k', 2019: 'k', 2020: 'a', 2021: 'a'}
2016
==============================
rank 1
Nproc 4
==============================
Nimgtot 1460
Nproc 4
Nimg 365
Traceback (most recent call last):
  File "MPI.py", line 130, in <module>
    with h5py.File(f'{str(year)}.h5', 'w') as f:
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 442, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 201, in make_fid
    fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 116, in h5py.h5f.create
BlockingIOError: [Errno 11] Unable to create file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')
{2016: 'j', 2017: 'j', 2018: 'k', 2019: 'k', 2020: 'a', 2021: 'a'}
2016
{2016: 'j', 2017: 'j', 2018: 'k', 2019: 'k', 2020: 'a', 2021: 'a'}
2016
Traceback (most recent call last):
  File "MPI.py", line 133, in <module>
    writetofile(src, dest, 0, ['u10'])
  File "MPI.py", line 75, in writetofile
    fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 441, in __init__
    fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 144, in make_fapl
    set_fapl(plist, **kwds)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 48, in _set_fapl_mpio
    raise ValueError("h5py was built without MPI support, can't use mpio driver")
ValueError: h5py was built without MPI support, can't use mpio driver
Traceback (most recent call last):
  File "MPI.py", line 130, in <module>
    with h5py.File(f'{str(year)}.h5', 'w') as f:
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 442, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 201, in make_fid
    fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 116, in h5py.h5f.create
BlockingIOError: [Errno 11] Unable to create file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')
==============================
rank 2
Nproc 4
==============================
Nimgtot 1460
Nproc 4
Nimg 365
Traceback (most recent call last):
  File "MPI.py", line 133, in <module>
    writetofile(src, dest, 0, ['u10'])
  File "MPI.py", line 75, in writetofile
    fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 441, in __init__
    fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 144, in make_fapl
    set_fapl(plist, **kwds)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 48, in _set_fapl_mpio
    raise ValueError("h5py was built without MPI support, can't use mpio driver")
ValueError: h5py was built without MPI support, can't use mpio driver
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[3210,1],0]
  Exit code:    1

datasets downloaded from cds.climate.copernicus.eu for each year with 20 paramenters

I just started using mpi4py and h5py, could you please help me to run the prepossessing parallel_copy.py