brightway-lca / brightway2-data

Tools for the management of inventory databases and impact assessment methods. Part of the Brightway LCA framework.
https://docs.brightway.dev/
BSD 3-Clause "New" or "Revised" License
8 stars 21 forks source link

Implement cached method Database.to_dataframe #91

Closed BenPortner closed 2 years ago

BenPortner commented 2 years ago

Implements a cached helper method to_dataframe() for SQLiteBackend (and hence also for the IOTable backend, because of inheritance). Uses standard functools.lru_cache with a size of 5. Would be neat to have for https://github.com/brightway-lca/brightway2-analyzer/pull/16.

Requesting review by @cmutel.

Speed improvement

On my machine, the improvement is approx. 8-fold. Using get_labeled_inventory from the corresponding bw2analyzer branch:

method = ('EF v3.0', 'climate change', 'global warming potential (GWP100)')
act = ("apos371", "2008044abc9469af9dee29707db7f8fb") # market for waste packaging paper
lca = LCA({act:1}, method)
lca.lci()

t_start = time()
df = get_labeled_inventory(lca, wide_format=True)
print(f"Elapsed time: {time()-t_start}") # Elapsed time: 3.068533182144165
t_start = time()
df = get_labeled_inventory(lca, wide_format=True) # Elapsed time: 0.39063262939453125
print(f"Elapsed time: {time()-t_start}")

Memory usage

Ecoinvent 3.7.1 APOS takes approximately 60 MB of RAM. Assuming a user would call Database(name).to_dataframe() on five different ecoinvent versions, the total cache size would be 5x60 = 300 MB. On the next call to a new database, the first entry from the cache would be dropped, so memory usage would stay constant.

cmutel commented 2 years ago

I would prefer to drop this size to 2 or 3 - if you have 5 different giant databases, then do all operations on them one at a time. This cache will live for the whole time the process is open, so we should be conservative.

Note that LRU caching will break some tests. We use the pytest plugin antilru, but this still produced problems on Windows (I think). We used to cache some databases lookups but stopped because I couldn't get it to not break tests.

cmutel commented 2 years ago

This is a fantastic effort, and I am not going to merge it. Instead, I will take the magic of pd.DataFrame(self) (still can't believe that this actually works), and add some options, like limiting fields and sorting. We also want two methods, one for activities and the other for exchanges (see https://github.com/brightway-lca/brightway2-data/issues/106). We also want to be compatible, or at least be in line with, the work to convert premise to use DataFrames.