ARPA-SIMC / dballe

Fast on-disk database for meteorological observed and forecast data.
Other
19 stars 6 forks source link

performance issue for dballe-7.30-2.fc24.x86_64 #112

Closed amontani closed 6 years ago

amontani commented 6 years ago

I have to load a bufr using "dbadb import -f --fast " with postgresql.

If I use dballe-7.28-1.fc24.x86_64 , I have real 4m7.437s user 0m21.115s sys 0m5.891s

If I use dballe-7.30-2.fc24.x86_64 , it takes three times more: real 14m10.717s user 0m17.833s sys 0m6.629s

brancomat commented 6 years ago

@spanezz : the performance issue can be tested with: time dbadb import -f --fast --dsn=postgresql://test:test@radicchio/test_build /autofs/scratch-mod/amontani/20180301.bufr

at the moment, ventiquattro has the latest dballe version, pigna has 7.28 (but I can downgrade ventiquattro on demand if needed)

spanezz commented 6 years ago

Can you try to build the branch issue112 and see if it improves things?

spanezz commented 6 years ago

Make sure you build from f096db6d9d9960542e36208324d3824bf68bcf46 or one of its descendents

spanezz commented 6 years ago

Also, can you please check if it looks like everything got imported? The optimized version makes much less queries than before, although all tests currently pass

brancomat commented 6 years ago

Can you try to build the branch issue112 and see if it improves things?

Trying on ventiquattro, it stops here:

g++ -DHAVE_CONFIG_H -I. -I..  -DTABLE_DIR=\"/usr/share/wreport\" -I.. -I..    -I/usr/include/mysql      -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic -c -o tests-main.o tests-main.cc
tests-main.cc:2:38: fatal error: wreport/utils/testrunner.h: No such file or directory
 #include <wreport/utils/testrunner.h>
spanezz commented 6 years ago

Can you install a newer wreport on ventiquattro?

brancomat commented 6 years ago

Repackaged both wreport 3.11 and dballe.

Results seems fine to me (a little slower than dballe-7.28 but this could be due to a temporary high load of the postgresql server radicchio):

[dbranchini@ventiquattro ~]$ time dbadb import -f --fast --dsn=postgresql://test:test@radicchio/test_build /autofs/scratch-mod/amontani/20180301.bufr

real    4m53.371s
user    2m10.612s
sys 0m3.107s

@amontani - can you confirm that dballe 7.31-2 on ventiquattro it's ok for you?

spanezz commented 6 years ago

In the meantime, I performed some finishing touches and merged to master

amontani commented 6 years ago

@amontani - can you confirm that dballe 7.31-2 on ventiquattro it's ok for you?

Dear all, I have just performed the tests on 3 different machines for the following command "time dbadb import -f --fast --dsn=postgresql://.... /autofs/scratch-mod/amontani/20180301.bufr" pigna ( dballe-7.28-1.fc24.x86_64) real 4m11.684s sonia (dballe-7.30-2.fc24.x86_64) real 14m46.667s ventiquattro (dballe-7.32-1.fc24.x86_64) real 4m20.990s

The oldest version (on pigna) still remains the fastest. Performance on the very new one (on ventiquattro) is also satisfactory. Just to give you some chronological info about the full issue. 1) you made me perform some test with a pre-release of dballe-7.30; the tests were very satisfactory, 2) BUT what was implemented as dballe-7.30-2.fc24.x86_64 (the very slow version), was NOT what I was given to test. 3) now, with dballe-7.32-1, the issue can be closed for me and we can move forward.

spanezz commented 6 years ago

For the news, I tried out these commands with profile information:

[ezini@pigna ~]$ DBA_PROFILE=1 time dbadb --wipe-first import -f --fast --dsn=postgresql://test:test@radicchio/te
st_build /autofs/scratch-mod/amontani/20180301.bufr >& old.profile
[ezini@pigna ~]$ cat old.profile
postgresql: 155303 queries
21.53user 6.27system 4:21.40elapsed 10%CPU (0avgtext+0avgdata 110448maxresident)k
36400inputs+8outputs (0major+27855minor)pagefaults 0swaps
----------
[ezini@ventiquattro ~]$ DBA_PROFILE=1 time dbadb --wipe-first import -f --fast --dsn=postgresql://test:test@radic
chio/test_build /autofs/scratch-mod/amontani/20180301.bufr >& new.profile
[ezini@ventiquattro ~]$ cat new.profile
postgresql: 73784 queries
127.23user 2.93system 4:30.24elapsed 48%CPU (0avgtext+0avgdata 72616maxresident)k
35576inputs+8outputs (0major+16417minor)pagefaults 0swaps

The new version makes half as many queries on the database, but does much more computation locally: it might be possible to profile/optimize some of that.

spanezz commented 6 years ago

Alternatively, it might be possible that there is a problem with the way queries are counted