cockroachdb / movr

A fictional ride sharing company.
Apache License 2.0
33 stars 14 forks source link

Convert movr.add_* bulk methods to use txn retries #51

Closed rmloveland closed 5 years ago

rmloveland commented 5 years ago

Fixes #42.

Summary of changes:

Converted movr.add_{users,vehicles,rides} use txn retries. This required adding some internal helper functions and changing the way some loops were written (flipping the logic "inside out" so we iterate over chunks and call run_transaction on the internal helper functions on a per-chunk basis).

It appears to work, e.g.,

$ time python3 loadmovr.py load --num-users=5000 --num-vehicles=1000 --num-rides=50000
[INFO] (MainThread) Connected to movr database @ postgres://root@localhost:26257/movr?sslmode=disable
[INFO] (MainThread) initializing tables
[INFO] (MainThread) loading cities ['new york', 'boston', 'washington dc', 'san francisco', 'seattle', 'los angeles', 'amsterdam', 'paris', 'rome']
[INFO] (MainThread) loading movr data with ~5000 users, ~1000 vehicles, and ~50000 rides
[INFO] (Thread-1  ) Generating data for new york...
[INFO] (Thread-5  ) Generating data for rome...
[INFO] (Thread-2  ) Generating data for washington dc...
[INFO] (Thread-4  ) Generating data for amsterdam...
[INFO] (Thread-3  ) Generating data for seattle...
[INFO] (Thread-2  ) populated washington dc in 64.991887 seconds
[INFO] (Thread-2  ) Generating data for san francisco...
[INFO] (Thread-4  ) populated amsterdam in 65.866611 seconds
[INFO] (Thread-4  ) Generating data for paris...
[INFO] (Thread-5  ) populated rome in 66.972107 seconds
[INFO] (Thread-1  ) populated new york in 67.171513 seconds
[INFO] (Thread-1  ) Generating data for boston...
[INFO] (Thread-3  ) populated seattle in 67.387184 seconds
[INFO] (Thread-3  ) Generating data for los angeles...
[INFO] (Thread-4  ) populated paris in 111.951555 seconds
[INFO] (Thread-1  ) populated boston in 112.543100 seconds
[INFO] (Thread-2  ) populated san francisco in 112.580423 seconds
[INFO] (Thread-3  ) populated los angeles in 112.922049 seconds
[INFO] (MainThread) populated 9 cities in 112.982776 seconds
[INFO] (MainThread) - 44.289937 users/second
[INFO] (MainThread) - 442.580734 rides/second
[INFO] (MainThread) - 8.921714 vehicles/second
    1m53.34s real     1m07.94s user     0m06.59s system

Using SQL to verify object counts:

movr=# select count(*) from vehicles;
 count
-------
  1008
(1 row)

movr=# select count(*) from users;
 count
-------
  5004
(1 row)

movr=# select count(*) from rides;
 count
-------
 50004
(1 row)
rmloveland commented 5 years ago

Note that I have not gone through the Docker / roachprod steps in the README, these numbers are just running the script directly against a local 3-node cluster. Also I have no idea how robust this is. :-)