eguil / Density_bining

Density bining code
2 stars 5 forks source link

Parallelisation with MPI #65

Open lebasn opened 6 years ago

lebasn commented 6 years ago

I'm trying use MPI to parallelise the density binning code. I first tried with threading on the binning loop with 4 and 8 threads without a performance increase.

Now I'm trying with MPI (from mpi4py import MPI) but I have trouble using both MPI and CdmsRegrid. Actually, I load all modules and global variables with all CPUs and only use the first one to execute the code (if(RANK ==0)).

If I'm not loading ESMP_INIT as follow:

from mpi4py import MPI
COMM = MPI.COMM_WORLD
RANK = COMM.Get_rank()

import numpy as npy
from string import replace
import time as timc
from scipy.interpolate import interp1d
from scipy.interpolate._fitpack import _bspleval
import gc,os,resource,timeit
print 'CHECK POINT 0!'
#from ESMP import ESMP_INIT
import cdms2 as cdm
print 'CHECK POINT 001!'
from cdms2 import CdmsRegrid, mvCdmsRegrid
print 'CHECK POINT 002!'

If i use mpirun -n 2 python binDensityMP2.py I have following message (but not with mpirun -n 1):

CHECK POINT 0!
CHECK POINT 0!
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(474).................:
MPID_Init(190)........................: channel initialization failed
MPIDI_CH3_Init(89)....................:
MPID_nem_init(272)....................:
MPIDI_CH3I_Seg_commit(366)............:
MPIU_SHMW_Hnd_deserialize(324)........:
MPIU_SHMW_Seg_open(865)...............:
MPIU_SHMW_Seg_create_attach_templ(637): open failed - No such file or directory

If I uncomment from ESMP import ESMP_INIT I can go throught module loading but get an error with CdmsRegrid:

<CurveGrid, id: grid_149x182, shape: (149, 182)> ,  Grid has Python id 0x7f120c360f50.
Gridtype: generic
Grid shape: (180, 360)
Order: yx
 ,  float32 ,  [  1.00e+20]
Traceback (most recent call last):
  File "binDensityMP2.py", line 1427, in <module>
    lev_thickt,timeint='1,12')
  File "binDensityMP2.py", line 365, in densityBinMP
    regridObj = CdmsRegrid(ingrid,msk.outgrid,depthBini.dtype,missing=valmask,regridMethod='distwgt',regridTool='esmf', coordSys='deg', diag = {},periodicity=1)
  File "/home/nillod/anaconda2/lib/python2.7/site-packages/cdms2/mvCdmsRegrid.py", line 424, in __init__
    **args)
  File "/home/nillod/anaconda2/lib/python2.7/site-packages/regrid2/mvGenericRegrid.py", line 135, in __init__
    **args)
  File "/home/nillod/anaconda2/lib/python2.7/site-packages/regrid2/mvESMFRegrid.py", line 159, in __init__
    hasBounds=self.hasSrcBounds)
  File "/home/nillod/anaconda2/lib/python2.7/site-packages/regrid2/esmf.py", line 192, in __init__
    staggerloc=[staggerloc], coord_sys=coordSys)
  File "/home/nillod/anaconda2/lib/python2.7/site-packages/ESMF/util/decorators.py", line 64, in new_func
    return func(*args, **kwargs)
  File "/home/nillod/anaconda2/lib/python2.7/site-packages/ESMF/api/grid.py", line 279, in __init__
    coordTypeKind=coord_typekind)
  File "/home/nillod/anaconda2/lib/python2.7/site-packages/ESMF/util/decorators.py", line 52, in new_func
    return func(*args, **kwargs)
  File "/home/nillod/anaconda2/lib/python2.7/site-packages/ESMF/interface/cbindings.py", line 340, in ESMP_GridCreate1PeriDim
    constants._errmsg)
ValueError: ESMC_GridCreate() failed with rc = 518.    Please check the log files (named "*ESMF_LogFile").
Exception AttributeError: "'NoneType' object has no attribute 'destroy'" in <bound method EsmfStructGrid.__del__ of <regrid2.esmf.EsmfStructGrid instance at 0x7f12137167a0>> ignored

It seems to be a conflict between Cdms and MPI modules. Maybe it is a known problem, but I didn't find any informations about it. Do you have an idea ?

I'm using ESMF_6_3_0rp1_ESMP_01 and Cdms 2.12 versions.

Maybe Charles could help as well ?

Thanks,

Nicolas

durack1 commented 6 years ago

@dnadeau4 can you assist with this, your experience with MPI would be very useful to tap

@doutriaux1 ping

doutriaux1 commented 6 years ago

cdms has the setNetcdfUseParallelFlag(1) flag

Also cdms2 variable have function to set the mpi com, such as:

var.setMPICom(comm)
var.getMPIRank()
var.getMPISize()

etc...

@dnadeau4 can comment more on this. But see: https://github.com/UV-CDAT/cdms/blob/master/Lib/tvariable.py#L706-L730

lebasn commented 6 years ago

Thanks I see the idea, I'll test it and give you feedbacks.