gams / openkongqi

Outdoor air quality data
Apache License 2.0
3 stars 6 forks source link

Optimize `write_records` method for speed and to ensure they are saved onto DB #15

Closed eddowh closed 8 years ago

eddowh commented 8 years ago

Highly recommended to bump version after merging.

eddowh commented 8 years ago

I was thinking - when we want to write a list of records into the DB, why don't we check the list for duplicates and remove them before committing, like such:

  1. We're given a list of tuples [(datetime, uuid, key, value) , ... ] corresponding to records.
  2. Remove the duplicates in the list of tuples by matching against datetime, uuid, and key (which are all primary keys when we define the Record schema in SQLAchemy)
  3. Map through the list of records tuples into a list of Record objects [Record<ts, uuid, key, value> , ...]
  4. By now, the list of Record objects can be added and committed to the database, which is simply done with:

    session.add_all(list_of_Record_objects)
    session.commit()

This way, we don't have to atomize the list of records when trying to commit them. It'd be faster and I suppose the code will also be more readable.

hrbonz commented 8 years ago

You can't know if duplicates are all in the data about to be pushed so let the db deal with that and handle the situation.

eddowh commented 8 years ago

Method implementation is outdated due to inconsistency with data format of input. See #30.

Closing.