catapult-project / catapult

Deprecated Catapult GitHub. Please instead use http://crbug.com "Speed>Benchmarks" component for bugs and https://chromium.googlesource.com/catapult for downloading and editing source code..
https://chromium.googlesource.com/catapult
BSD 3-Clause "New" or "Revised" License
1.91k stars 563 forks source link

Soundwave frequently fails to fetch timeseries due to 'database is locked' sqlite3 errors #4520

Closed zeptonaut closed 6 years ago

zeptonaut commented 6 years ago

When fetching 70 days worth of data for the 3992 timeseries for system_health.common_desktop, it seems that this error happens about a third of the time.

Here's an example exception that I got:

$ bin/soundwave -d 70 timeseries -b system_health.common_desktop
3392 test paths found!
Fetching data of 3392 timeseries: ......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ERROR:root:Worker failed with exception
Traceback (most recent call last):
  File "./soundwave/worker_pool.py", line 71, in _Worker
    Process(item)
  File "./soundwave/commands.py", line 97, in Process
    pandas_sqlite.InsertOrReplaceRecords(con, 'timeseries', timeseries)
  File "./soundwave/pandas_sqlite.py", line 58, in InsertOrReplaceRecords
    c.executemany(insert_statement, zip(*data))
OperationalError: database is locked
Traceback (most recent call last):
  File "bin/soundwave", line 104, in <module>
    Main()
  File "bin/soundwave", line 98, in Main
    commands.FetchTimeseriesData(args)
  File "./soundwave/commands.py", line 140, in FetchTimeseriesData
    _FetchTimeseriesWorker, args, test_paths)
  File "./soundwave/worker_pool.py", line 64, in Run
    ProgressIndicator(label, pool.imap_unordered(_Worker, items), stream=stream)
  File "./soundwave/worker_pool.py", line 39, in ProgressIndicator
    for _ in iterable:
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 668, in next
    raise value
sqlite3.OperationalError: database is locked

Based on this Stackoverflow answer, it seems like telling sqlite to use a write-ahead log should help with this drastically.

I have a local CL with this change and, in my local testing, I successfully ran the fetch timeseries command 8 times in a row after making this change. As a bonus: this seems to boost the rate at which we fetch timeseries from about 35 test paths per second to 80 test paths per second.

@perezju

perezju commented 6 years ago

Nice! Awesome find!

zeptonaut commented 6 years ago

Doh - I used the wrong bug number in the CL. This was fixed in https://chromium-review.googlesource.com/c/catapult/+/1108563