ARPA-SIMC / dballe

Fast on-disk database for meteorological observed and forecast data.
Other
19 stars 6 forks source link

v7d_transform not able to cumulate precipitation for large bufr files with dballe-7.32-1.fc24.x86_64 #116

Closed amontani closed 6 years ago

amontani commented 6 years ago

I have to cumulate 6-hourly precipitation for a large number of stations (about 1500) for 3-month verification of COSMO-LEPS.

The command is: v7d_transform --disable-qc '--input-format=BUFR' '--variable-list=B13011' '--output-format=BUFR' --comp-stat-proc 1:1 '--comp-step=00 06' 2018_03040506_tponly_fulldom.bufr tp06h.bufr

On radicchio (dballe-7.32-1.fc24.x86_64): I started this command this morning (10am, 13/6/2018); now, it is 3 pm and, after more 393 minutes, the command is still running and no output is produced.

On sonia (after a downgrade, dballe-7.28-1.fc24.x86_64): It takes some time, but I get an output real 47m50.710s user 9m48.807s sys 0m24.057s

Please, notice that radicchio has more memory than sonia. I did not change libsim, only dballe. You find the big input file under /autofs/scratch-mod/amontani/2018_03040506_tponly_fulldom.bufr

brancomat commented 6 years ago

I just downgraded dballe to 7.28 on radicchio, this should allow @amontani to work (he will confirm that), and to degrade the bug to devel blocking

As an additional info: I did not recompile libsim package after upgrading dballe to 7.32, but to my knowledge this shouldn't have had an impact

dcesari commented 6 years ago

FYI, in the case when it takes ages to run, an on-the-fly profiling with perf top shows typically:

  55,05%  libstdc++.so.6.0.22         [.] std::_Rb_tree_increment                                                                          ▒
  38,58%  libdballe.so.7.0.3          [.] dballe::db::v7::batch::MeasuredData::write_pending                                               ▒
   2,32%  libdballe.so.7.0.3          [.] dballe::db::v7::batch::Station::write_pending                                                    ▒
   0,89%  libdballe.so.7.0.3          [.] _init                                                                                            ▒
   0,60%  [kernel]                    [k] __lock_text_start                                                                                ▒
   0,49%  libstdc++.so.6.0.22         [.] std::_Rb_tree_increment                                                                          ▒
   0,09%  libdballe.so.7.0.3          [.] dballe::db::v7::batch::StationData::write_pending                                                ▒
   0,07%  [kernel]                    [k] finish_task_switch                               

i.e. most of the time spent in std::_Rb_tree_increment, hope this helps.

amontani commented 6 years ago

Thanks to the kind downgrade to dballe-7.28-1.fc24.x86_64 on radicchio, the v7d_transform command now runs and takes real 7m50,00s user 7m37,96s sys 0m8,38s

spanezz commented 6 years ago

I pushed 9694b1e6e366e7f31f0482c4b4d204d18e3c981f with an attempt at optimizing this. I'll be there in a couple of hours to do some testing with you, and in the meantime, if you have an opportunity, please try it out

brancomat commented 6 years ago

dballe v7.33-2 has been built and should address this issue, waiting for feedback

amontani commented 6 years ago

I have just tried 2 commands on ventiquattro, under /autofs/scratch-mod/amontani, using dballe-7.33-2.fc24.x86_64

first command:

time v7d_transform --disable-qc --input-format='BUFR' --variable-list=B13011 \ --output-format='BUFR' \ --comp-stat-proc 1:1 --comp-step='00 06' 2018_03040506_tponly_fulldom.bufr tmp.bufr real 5m21,60s user 5m5,50s sys 0m4,76s

second command:

time v7d_transform --disable-qc --input-format='BUFR' \ --variable-list=B13011 --output-format='BUFR' \ --comp-stat-proc 1:1 --comp-step='00 12' \ --comp-start='2018-03-01 06:00' tmp.bufr tp12h.bufr real 1m23,26s user 1m17,98s sys 0m0,60s

For both commands, I am happy with the performance and the results; my feedback is positive.