NOAA-EMC / GDASApp

Global Data Assimilation System Application
GNU Lesser General Public License v2.1
15 stars 31 forks source link

Profiling of old/new converters #677

Closed guillaumevernieres closed 11 months ago

guillaumevernieres commented 11 months ago

Description

Documentation of the profiling of the new converters within a realistic configuration (similar number of files/obs count) that would simulate what would have to be processed in real time.

For each instrument, prepare profiling tables. Here's an example for the VIIRS (36 files, ~5B obs):

Number of PE's Num. Obs. in Num. Obs. out Runtime Total Memory Memory per Task Python converter
Serial ? ? 95.77 sec total: 6.47 Gb per task: min = 6.47 Gb, max = 6.47 Gb ?
2 PEs ? ? 52.82 sec total: 12.93 Gb per task: min = 6.47 Gb, max = 6.47 Gb ?
4 PEs ? ? 40.24 sec total: 25.84 Gb per task: min = 6.46 Gb, max = 6.46 Gb ?

The profiling needs to be done on the compute nodes, not the login nodes. If possible, include the profiling from the old converters.

ShastriPaturi commented 11 months ago

@apchoiCMD the example shell script to run the python ioda-converter for SST (gds2_sst2ioda.py): /scratch1/NCEPDEV/da/Shastri.Paturi/ioda_sst_NCEP/convert_sst_l3u_ncep.sh

guillaumevernieres commented 11 months ago

@apchoiCMD , sbatch script example for hera:

#SBATCH --account=da-cpu 
#SBATCH --qos=debug 
#SBATCH --output=gdas_iodaconv.out 
#SBATCH --nodes=1 
#SBATCH --partition=hera 
#SBATCH --time=00:15:00 

# load modules
module use ....
module load GDAS/hera

# Run serial stuff
<path to bin>/gdas_obsprovider2ioda.x provider.yaml

# Or run mpi stuff
srun -n 2 <path to bin>/gdas_obsprovider2ioda.x provider.yaml

To submit the job just do

sbatch nameofscriptabove
apchoiCMD commented 11 months ago

36 files of SST from VIIRS-N20, run from Orion

Number of PE's Num. Obs. in Num. Obs. out Runtime Total Memory Memory per Task Python converter
Serial 5.832B 304,308 200.95 sec total: 6.25 Gb per task: min = 6.25 Gb, max = 6.25 Gb ~519 sec~
2 PEs 5.832B 304,308 485.72 sec total: 12.41 Gb per task: min = 6.20 Gb, max = 6.20 Gb n/a
4 PEs 5.832B 304,308 781.40 sec total: 24.80 Gb per task: min = 6.19 Gb, max = 6.21 Gb n/a
guillaumevernieres commented 11 months ago

36 files of SST from VIIRS-N20, run from Orion

Number of PE's Num. Obs. in Num. Obs. out Runtime Total Memory Memory per Task Python converter Serial 5.832B 304,308 200.95 sec total: 6.25 Gb per task: min = 6.25 Gb, max = 6.25 Gb 519 sec 2 PEs 5.832B 304,308 485.72 sec total: 12.41 Gb per task: min = 6.20 Gb, max = 6.20 Gb n/a 4 PEs 5.832B 304,308 781.40 sec total: 24.80 Gb per task: min = 6.19 Gb, max = 6.21 Gb n/a

Let's forget about mpi for now @apchoiCMD . We'll debug later.

apchoiCMD commented 11 months ago

Profiling tests are done on Orion machine

Provider Num. Obs. in Num. Obs. out Runtime Total Memory Memory per Task
GHRSST 5.832B 304,308 200.95 sec total: 6.25 Gb per task: min = 6.25 Gb, max = 6.25 Gb
RADS 429,352 429,352 7 sec total: 99.54 Mb per task: min = 99.54 Mb, max = 99.54 Mb
AMSR2 9.04M 9.04M 36.55 sec total: 845.44 Mb per task: min = 845.44 Mb, max = 845.44 Mb
NASA for SMAP 493.696 493.696 5.44 sec total: 96.97 Mb per task: min = 96.97 Mb, max = 96.97 Mb
ESA for SMOS 372,753 372,753 6.95 sec total: 91.36 Mb per task: min = 91.36 Mb, max = 91.36 Mb
apchoiCMD commented 11 months ago

Profiling TEST Results for Python IODA Converter (on Orion)

Provider Num. files in Runtime Num. Obs. out Read Write Error
GHRSST 1 27 sec 33,489 ✔️ ✔️
36 ~519 sec~ ✔️ Concatenation Error
RADS 1 10 sec 44,668 ✔️ ✔️
10 Failed
AMSR2 1 74 sec 573,660 ✔️ ✔️
10 362 sec 5,687,366 ✔️ ✔️
NASA for SMAP 1 Failed
8 Failed
ESA for SMOS 1 18 sec 58,996 ✔️ ✔️
7 45 sec 372,753 ✔️ ✔️
guillaumevernieres commented 11 months ago

Thanks @apchoiCMD . No need to spend time on the concatenation error. The table above is good enough.

guillaumevernieres commented 11 months ago

@apchoiCMD : A519s for the 36 ghrsst file is the time reported by your failed batch job, it has nothing to do with the conversion. Remove the number, it doesn't mean anything.

apchoiCMD commented 11 months ago

@guillaumevernieres Thanks! I just want to let you know,,, I will remove it-