Profiling of old/new converters

guillaumevernieres commented 11 months ago

Description

Documentation of the profiling of the new converters within a realistic configuration (similar number of files/obs count) that would simulate what would have to be processed in real time.

For each instrument, prepare profiling tables. Here's an example for the VIIRS (36 files, ~5B obs):

Number of PE's	Num. Obs. in	Num. Obs. out	Runtime	Total Memory	Memory per Task	Python converter
Serial	?	?	95.77 sec	total: 6.47 Gb	per task: min = 6.47 Gb, max = 6.47 Gb	?
2 PEs	?	?	52.82 sec	total: 12.93 Gb	per task: min = 6.47 Gb, max = 6.47 Gb	?
4 PEs	?	?	40.24 sec	total: 25.84 Gb	per task: min = 6.46 Gb, max = 6.46 Gb	?

The profiling needs to be done on the compute nodes, not the login nodes. If possible, include the profiling from the old converters.

ShastriPaturi commented 11 months ago

@apchoiCMD the example shell script to run the python ioda-converter for SST (gds2_sst2ioda.py): /scratch1/NCEPDEV/da/Shastri.Paturi/ioda_sst_NCEP/convert_sst_l3u_ncep.sh

guillaumevernieres commented 11 months ago

@apchoiCMD , sbatch script example for hera:

#SBATCH --account=da-cpu 
#SBATCH --qos=debug 
#SBATCH --output=gdas_iodaconv.out 
#SBATCH --nodes=1 
#SBATCH --partition=hera 
#SBATCH --time=00:15:00 

# load modules
module use ....
module load GDAS/hera

# Run serial stuff
<path to bin>/gdas_obsprovider2ioda.x provider.yaml

# Or run mpi stuff
srun -n 2 <path to bin>/gdas_obsprovider2ioda.x provider.yaml

To submit the job just do

sbatch nameofscriptabove

apchoiCMD commented 11 months ago

36 files of SST from VIIRS-N20, run from Orion

Number of PE's	Num. Obs. in	Num. Obs. out	Runtime	Total Memory	Memory per Task	Python converter
Serial	5.832B	304,308	200.95 sec	total: 6.25 Gb	per task: min = 6.25 Gb, max = 6.25 Gb	~519 sec~
2 PEs	5.832B	304,308	485.72 sec	total: 12.41 Gb	per task: min = 6.20 Gb, max = 6.20 Gb	n/a
4 PEs	5.832B	304,308	781.40 sec	total: 24.80 Gb	per task: min = 6.19 Gb, max = 6.21 Gb	n/a

guillaumevernieres commented 11 months ago

36 files of SST from VIIRS-N20, run from Orion

Number of PE's Num. Obs. in Num. Obs. out Runtime Total Memory Memory per Task Python converter Serial 5.832B 304,308 200.95 sec total: 6.25 Gb per task: min = 6.25 Gb, max = 6.25 Gb 519 sec 2 PEs 5.832B 304,308 485.72 sec total: 12.41 Gb per task: min = 6.20 Gb, max = 6.20 Gb n/a 4 PEs 5.832B 304,308 781.40 sec total: 24.80 Gb per task: min = 6.19 Gb, max = 6.21 Gb n/a

Let's forget about mpi for now @apchoiCMD . We'll debug later.

apchoiCMD commented 11 months ago

Profiling tests are done on Orion machine

Provider	Num. Obs. in	Num. Obs. out	Runtime	Total Memory	Memory per Task
GHRSST	5.832B	304,308	200.95 sec	total: 6.25 Gb	per task: min = 6.25 Gb, max = 6.25 Gb
RADS	429,352	429,352	7 sec	total: 99.54 Mb	per task: min = 99.54 Mb, max = 99.54 Mb
AMSR2	9.04M	9.04M	36.55 sec	total: 845.44 Mb	per task: min = 845.44 Mb, max = 845.44 Mb
NASA for SMAP	493.696	493.696	5.44 sec	total: 96.97 Mb	per task: min = 96.97 Mb, max = 96.97 Mb
ESA for SMOS	372,753	372,753	6.95 sec	total: 91.36 Mb	per task: min = 91.36 Mb, max = 91.36 Mb

apchoiCMD commented 11 months ago

Profiling TEST Results for Python IODA Converter (on Orion)

Test for ONLY informational purpose for comparison to new IODA converter
Test input from multiple files were used based on 6 hour dump of each provider
Failed tests are under investigation, it will be updated if test is passed

Provider	Num. files in	Runtime	Num. Obs. out	Read	Write	Error
GHRSST	1	27 sec	33,489	✔️	✔️
	36	~519 sec~		✔️		Concatenation Error
RADS	1	10 sec	44,668	✔️	✔️
	10					Failed
AMSR2	1	74 sec	573,660	✔️	✔️
	10	362 sec	5,687,366	✔️	✔️
NASA for SMAP	1					Failed
	8					Failed
ESA for SMOS	1	18 sec	58,996	✔️	✔️
	7	45 sec	372,753	✔️	✔️

guillaumevernieres commented 11 months ago

Thanks @apchoiCMD . No need to spend time on the concatenation error. The table above is good enough.

guillaumevernieres commented 11 months ago

@apchoiCMD : A519s for the 36 ghrsst file is the time reported by your failed batch job, it has nothing to do with the conversion. Remove the number, it doesn't mean anything.

apchoiCMD commented 11 months ago

@guillaumevernieres Thanks! I just want to let you know,,, I will remove it-

NOAA-EMC / GDASApp

Profiling of old/new converters #677

Description