Closed elminster-aom closed 3 years ago
I have a dilemma here: From the comment, I understand that in this scenario, the recommended option is inserting the registers one by one, e.g.:
sql_insert_string = f"""INSERT INTO {self.db_table} VALUES (
%(time)s,
%(web_url)s,
%(http_status)s,
%(resp_time)s,
%(regex_match)s
);
"""
self.connect()
with self.db_connect.cursor() as db_cursor:
db_cursor.execute(sql_insert_string, metric)
However, usually in our case, more than one metric
can be consumed from Kafka at once; which it means that code would need to iterate over those metrics
and do multiple inserts in a row. Therefore, based on psycopg2.extras – execute_batch:
... Execute sql several times, against all parameters set (sequences or mappings) found in argslist.
The function is semantically similar to
cur.executemany(sql, argslist)
but has a different implementation: Psycopg will join the statements into fewer multi-statement commands, each one containing at most page_size statements, resulting in a reduced number of server roundtrips. ...
We better considered that psycopg2.extras.execute_batch
may be a more efficient method because it will reduce our code and the number of commands against DB.
Uses COPY command rather than parametrized queries for DB insertion. A link is provided explaining the performance reasons behind this but for this case it's a huge premature optimization without a compelling justification.