brightway-lca / brightway2-data

Tools for the management of inventory databases and impact assessment methods. Part of the Brightway LCA framework.
https://docs.brightway.dev/
BSD 3-Clause "New" or "Revised" License
8 stars 21 forks source link

Massive sqlite query performance decrease in new projects #60

Closed aleksandra-kim closed 5 years ago

aleksandra-kim commented 5 years ago

Original report by Adrian Haas (Bitbucket: haasad, GitHub: haasad).


Issue originally raised on ActivityBrowser github: Issue 189

Old project with cutoff 3.3 (database created in mid February 2018):

%timeit bw.Database('cutoff33').search('water')
42.3 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit bw.Database('cutoff33').get(random_code)
404 µs ± 3.99 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

A current project created today:

 %timeit bw.Database('cutoff33').search('water')
476 ms ± 1.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit bw.Database('cutoff33').get(random_code)
17.9 ms ± 76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

All performance tests were run in the same conda enviornment:

conda list | grep -E 'br|bw|pee' 
brightway2                2.3                        py_0    cmutel
bw2analyzer               0.9.4                      py_0    cmutel
bw2calc                   1.7.2                      py_0    cmutel
bw2data                   3.4.2                      py_0    cmutel
bw2io                     0.7                        py_0    cmutel
bw2parameters             0.6.5                      py_0    cmutel
peewee                    3.6.4            py36h65ede16_0    conda-forge

There must have been some change in the way the databases are stored/created/indexed that causes this performance decrease, but could also be caused by changes upstream of brightway, eg. peewee. I don't understand this part of the brightway codebase well enough to "diagnose" this.

One major difference that I noticed is this:

...


It appears that all the additional time is caused by calls to ` {method 'execute' of 'sqlite3.Cursor' objects}`, hope this information is useful for you, because I don't understand what it means :smile:
aleksandra-kim commented 5 years ago

Original comment by Adrian Haas (Bitbucket: haasad, GitHub: haasad).


I just tested setting up a new project with downgraded versions of several packages and this indeed makes the queries fast again:

# packages in environment at /home/adrian/miniconda3/envs/bwspeed:
brightway2                2.1.2                      py_0    cmutel
bw2analyzer               0.9.4                      py_0    cmutel
bw2calc                   1.7.2                      py_0    cmutel
bw2data                   3.1.1                      py_0    cmutel
bw2io                     0.6.RC5                    py_1    cmutel
bw2parameters             0.6.5                      py_0    cmutel
peewee                    2.10.2                   py36_0    conda-forge

Only the versions at the time of writing the database are crucial, search speed is the same for old and new versions.

aleksandra-kim commented 5 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


I don't think this is related to old or new projects, but rather to the SQLite indices being present or not. Can you use something like http://sqlitebrowser.org/ to open the database in the project that you are having trouble with (projects.dir > lci > databases.db), and check whether the database has indices? When I can reproduce the difference on my machine, the slow project doesn't have any indices on the SQLite database.

aleksandra-kim commented 5 years ago

Original comment by Adrian Haas (Bitbucket: haasad, GitHub: haasad).


Yes, can confirm, indices are missing in all of my newer projects and present in my old fast project and the one created with the downgraded packages. It appears that the indices aren't created anymore since (I suspect) the upgrade of peewee to version 3 and the associated syntax changes in bw.

aleksandra-kim commented 5 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


I think the answer is actually simpler - probably I commented out the indexing code when importing many versions of ecoinvent and accidentally committed it, or something like that. So instead of weird interaction effects, just me being dumb.

aleksandra-kim commented 5 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


Fix #60. Switch sqlite3_lci_db object when switching projects

aleksandra-kim commented 5 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


Should be fixed in 3.4.3 release. You can add indices instead of re-importing a database (in 3.4.3) with Database(name)._add_indices().

aleksandra-kim commented 5 years ago

Original comment by Adrian Haas (Bitbucket: haasad, GitHub: haasad).


Thanks for fixing this! I added a function that makes sure the indices are present in the AB. Could also be useful to have this directly in brightway: https://github.com/LCA-ActivityBrowser/activity-browser/blob/dca8b0b4763a4d5a1d5b10e90d4e7f1d54789e8b/activity_browser/app/controller.py#L111-L118

On the other hand, it took a few months until somebody noticed the difference, so it might not be super relevant.