MeltanoLabs / target-postgres

MIT License
11 stars 18 forks source link

Performance Testing #10

Open visch opened 2 years ago

visch commented 2 years ago

Setup a test for how long it takes to load

  1. 10,000 records
  2. 100,000 records
  3. 1,000,00 records

Compare these to:

  1. current target-postgres
  2. loading directly to postgres with a psql import directly

Programmatically would be best here

sebastianswms commented 10 months ago

I have run the specified tests. The first few rows from the dummy database I used are provided below.

last_name|first_name |street_address                |email                      |index|_sdc_file_name                   |_sdc_line_number|_sdc_last_modified     |_sdc_extracted_at      |_sdc_received_at       |_sdc_batched_at        |_sdc_deleted_at|_sdc_sequence|_sdc_table_version|_sdc_sync_started_at|
---------+-----------+------------------------------+---------------------------+-----+---------------------------------+----------------+-----------------------+-----------------------+-----------------------+-----------------------+---------------+-------------+------------------+--------------------+
Butler   |Chelsea    |914 Davis Street              |laura47@example.com        |0    |/git/jsonl2postgres/data/1M.jsonl|               1|2023-11-24 21:44:54.000|2023-11-24 21:46:52.687|2023-11-24 21:46:52.904|2023-11-24 21:46:52.904|               |1700862412905|                  |       1700862412850|
Garrison |Raymond    |66496 Pierce Burgs Apt. 300   |peterwoods@example.net     |1    |/git/jsonl2postgres/data/1M.jsonl|               2|2023-11-24 21:44:54.000|2023-11-24 21:46:52.687|2023-11-24 21:46:52.905|2023-11-24 21:46:52.904|               |1700862412906|                  |       1700862412850|
Daniel   |Nancy      |276 Michael Oval              |williamsmichele@example.net|2    |/git/jsonl2postgres/data/1M.jsonl|               3|2023-11-24 21:44:54.000|2023-11-24 21:46:52.687|2023-11-24 21:46:52.906|2023-11-24 21:46:52.904|               |1700862412906|                  |       1700862412850|
Combs    |Christopher|314 Griffin Loaf              |bharding@example.com       |3    |/git/jsonl2postgres/data/1M.jsonl|               4|2023-11-24 21:44:54.000|2023-11-24 21:46:52.687|2023-11-24 21:46:52.906|2023-11-24 21:46:52.904|               |1700862412907|                  |       1700862412850|
Perry    |Richard    |9258 Dawson Manors Apt. 575   |xmoore@example.com         |4    |/git/jsonl2postgres/data/1M.jsonl|               5|2023-11-24 21:44:54.000|2023-11-24 21:46:52.687|2023-11-24 21:46:52.907|2023-11-24 21:46:52.904|               |1700862412907|                  |       1700862412850|
Brennan  |Erica      |94620 Nicholas Avenue Apt. 600|amy59@example.net          |5    |/git/jsonl2postgres/data/1M.jsonl|               6|2023-11-24 21:44:54.000|2023-11-24 21:46:52.687|2023-11-24 21:46:52.907|2023-11-24 21:46:52.904|               |1700862412907|                  |       1700862412850|
Molina   |Brandon    |9311 Juan Mission Suite 872   |veronica29@example.com     |6    |/git/jsonl2postgres/data/1M.jsonl|               7|2023-11-24 21:44:54.000|2023-11-24 21:46:52.687|2023-11-24 21:46:52.907|2023-11-24 21:46:52.904|               |1700862412908|                  |       1700862412850|
Pratt    |Alyssa     |7118 Michael Hollow Suite 103 |snydertiffany@example.net  |7    |/git/jsonl2postgres/data/1M.jsonl|               8|2023-11-24 21:44:54.000|2023-11-24 21:46:52.687|2023-11-24 21:46:52.907|2023-11-24 21:46:52.904|               |1700862412908|                  |       1700862412850|
Wood     |Robert     |884 Gaines Estates            |suzanne29@example.org      |8    |/git/jsonl2postgres/data/1M.jsonl|               9|2023-11-24 21:44:54.000|2023-11-24 21:46:52.687|2023-11-24 21:46:52.908|2023-11-24 21:46:52.904|               |1700862412908|                  |       1700862412850|
Berry    |Kara       |20916 Paul Lights Apt. 284    |millergina@example.com     |9    |/git/jsonl2postgres/data/1M.jsonl|              10|2023-11-24 21:44:54.000|2023-11-24 21:46:52.688|2023-11-24 21:46:52.908|2023-11-24 21:46:52.904|               |1700862412909|                  |       1700862412850|

pg_dump and psql

time pg_dump -U postgres -t melty.users | psql -U postgres postgres2

10,000

real    0m0.081s
user    0m0.025s
sys     0m0.029s

100,000

real    0m0.294s
user    0m0.064s
sys     0m0.021s

1,000,000

real    0m2.242s
user    0m0.110s
sys     0m0.270s

MeltanoLabs/tap-postgres and MeltanoLabs/target-postgres:

10,000

real    0m6.899s
user    0m6.167s
sys     0m0.392s

100,000

real    0m37.223s
user    0m42.277s
sys     0m1.705s

1,000,000

real    5m43.368s
user    6m57.207s
sys     0m13.646s

transferwise/pipelinewise-tap-postgres and transferwise/pipelinewise-target-postgres:

10,000

real    0m3.147s
user    0m1.524s
sys     0m0.099s

100,000

real    0m12.502s
user    0m15.688s
sys     0m1.751s

1,000,000

real    1m38.532s
user    2m24.671s
sys     0m16.911s
visch commented 10 months ago

I like what you did here, I think we should make some kind of script that we can run to test this. Ideally we'd probably make a whole performance infrastructure to test over time and track but that seems a bit overkill.

Could you make a script python or something, drop it in the ./scripts dir here maybe performance_testing.py or something that we can run that does all of these tests for us and outputs the results in a similar format.

I want to see what happens with the new performance update you made, and then it'd be nice to be able to "objectively" have folks run the test to see if we improved things or not.