Poor execution time - Githubissues

srherbener commented 6 years ago

Using a sample prepbufr file (prepbufr.gdas.20160304.t06z.nr.48h, 62MB) it takes bufr2nc.py 1.5 hours to convert AIRCFT and AIRCAR data. The total number of AIRCFT and AIRCAR subsets is 2268 so it seems that the conversion should run much, much faster (perhaps under a minute).

srherbener commented 6 years ago

Tests show that the underlying libbufr.a routines spend most of the time in extracting data pieces (corresponding to a list of mnemonics) out of the BUFR subsets. Not sure yet, why this process is slow.

srherbener commented 6 years ago

Further tests point out that the reading of event data versus individual data from the prepBUFR file is slower, which is to be expected, and that the Python/Fortran interface appears to be a bottleneck.

Four test cases were run that visit every subset in a file and read the "TOB" value out of each of those subsets. The test cases exercise the combinations of Fortran vs. Python and individual data vs. event data. The input file used was prepbufr.gdas.20160304.t06z.nr.48h which contains 806983 subsets.

Experiment Name	Top Level Language	Obs Type
FI	Fortran	Individual
FE	Fortran	Event
PI	Python	Individual
PE	Python	Event

"Top Level Language" refers to how the main program was coded. Fortran means that the main program was written in Fortran which was compiled and directly linked to libbufr.a. Python means that the main program was written in Python and the ncepbufr python package was used to accedd libbufr.a.

Experiment	Runtime (s)
FI	8.5
FE	45
PI	128
PE	2895

The Fortran examples are much faster than the Python examples, plus the difference between reading individual data compared to event data gets much larger for Python versus Fortran. It seems that there is an inefficiency in the interface between Python and Fortran.

srherbener commented 6 years ago

We have decided to address the performance issue through two modifications to bufr2nc.py

Change method of calling read_subset() once per mnemonic to calling read_subset() with lists of mnemonics in order to minimize the total number of times read_subset() is called.
Add an option (-m) that specifies the maximum number of messages that will be processed. This will allow us to get enough observations to test downstream flows without having to process entire BUFR files.

These changes have been implemented and updated to the master and develop branches.

JCSDA-internal / ioda-converters

Poor execution time #1