bulk update of exchanges is slow with peewee

brightway-lca / brightway2-data

Tools for the management of inventory databases and impact assessment methods. Part of the Brightway LCA framework.

https://docs.brightway.dev/

BSD 3-Clause "New" or "Revised" License

11 stars 25 forks source link

bulk update of exchanges is slow with peewee #46

Open aleksandra-kim opened 8 years ago

aleksandra-kim commented 8 years ago

Original report by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).

If I want to update all the amounts of the inputs of an activity (with different values), I would have to iterate over the technosphere of the activity, update the amount field per exchange, and call "save()" on each exchange. With n technosphere exchanges, this requires n transactions, in current implementation.

I would like to group the updates, so that there is only one transaction, and hopefully less I/O to hardrive.

The idea from: http://docs.peewee-orm.com/en/latest/peewee/querying.html#atomic-updates

something in the line of:

"""
exchanges is a dictionary of oldExchange:newExchange
"""
def bulk_update_exchanges(activity, exchanges):
    with db.atomic as txn:
        for old, new in exchanges.items():
            old.update(data = new.data)

aleksandra-kim commented 8 years ago

Original comment by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).

like the private method _efficient_write_many_data from the sqlite backend (https://bitbucket.org/cmutel/brightway2-data/src/aa5e4a8377aef097be0e694ead2a149ec04dec84/bw2data/backends/peewee/database.py?at=default&fileviewer=file-view-default#database.py-147)

aleksandra-kim commented 8 years ago

Original comment by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).

So, I found a quick hack to what I wanted to achieve, but I am not sure this is the right way to go. Specially, since this is not "generic" at all

from brightway2 import *
from bw2data.backends.peewee import sqlite3_lci_db as db 
import random

# ... project, db, activity finding
act = Database('my_db').get('myActivity')

with db.atomic() as txn:
    for e in act.technosphere():
        v = random.random()
        e_ds = e._document
        e_ds.data.update(amount = v)
        e.save()

Of course the raw import of the sqlite_lci_db can be done in a more elegant way, depending on the type of backend.

My doubt is specially on the need to recover the underlying document (ExchangeDataset) for the object (Exchange).

It seems to work for now, but my request remains. ;)

aleksandra-kim commented 8 years ago

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).

Yes, this is a weakness of the current model of abstraction layer cake. Actually, I think your approach is quite reasonable, though you could do something directly with ActivityDataset objects using normal Peewee methods, e.g. what actually happens when you call .get().

The problem with ActivityDataset and ExchangeDataset is that there actually isn't any foreign keys or other automatic relationships between them. So you will have to manage these yourself, and make sure you don't create mismatches between the tables. As you have seen, you can also gain some speed by dropping down to straight SQL from Python, but this only really makes sense in special circumstances.