elminster-aom / homeworks

Entertainment exercise for a basic web monitor
The Unlicense
1 stars 0 forks source link

Uses COPY command rather than parametrized queries for DB insertion #8

Closed elminster-aom closed 3 years ago

elminster-aom commented 3 years ago

Uses COPY command rather than parametrized queries for DB insertion. A link is provided explaining the performance reasons behind this but for this case it's a huge premature optimization without a compelling justification.

elminster-aom commented 3 years ago

I have a dilemma here: From the comment, I understand that in this scenario, the recommended option is inserting the registers one by one, e.g.:

sql_insert_string = f"""INSERT INTO {self.db_table} VALUES (
        %(time)s,
        %(web_url)s,
        %(http_status)s,
        %(resp_time)s,
        %(regex_match)s
    );
"""
self.connect()
with self.db_connect.cursor() as db_cursor:
    db_cursor.execute(sql_insert_string, metric)

However, usually in our case, more than one metric can be consumed from Kafka at once; which it means that code would need to iterate over those metrics and do multiple inserts in a row. Therefore, based on psycopg2.extras – execute_batch:

... Execute sql several times, against all parameters set (sequences or mappings) found in argslist.

The function is semantically similar to cur.executemany(sql, argslist) but has a different implementation: Psycopg will join the statements into fewer multi-statement commands, each one containing at most page_size statements, resulting in a reduced number of server roundtrips. ...

We better considered that psycopg2.extras.execute_batch may be a more efficient method because it will reduce our code and the number of commands against DB.