catapult-project / catapult

Deprecated Catapult GitHub. Please instead use http://crbug.com "Speed>Benchmarks" component for bugs and https://chromium.googlesource.com/catapult for downloading and editing source code..
https://chromium.googlesource.com/catapult
BSD 3-Clause "New" or "Revised" License
1.93k stars 563 forks source link

Soundwave only writing one data point per timeseries into database #4521

Closed zeptonaut closed 6 years ago

zeptonaut commented 6 years ago

This is a really strange bug that I'm encountering. I'm trying to fetch 70 days worth of data for the system health benchmarks, but only one data point per timeseries is ever getting written into soundwave.db.

I've verified this with the following modifications to commands.py:

def _FetchTimeseriesWorker(args):
  api = dashboard_api.PerfDashboardCommunicator(args)
  con = sqlite3.connect(args.database_file, timeout=10)

  def Process(test_path):
    data = api.GetTimeseries(test_path, days=args.days)
    if data:
      timeseries = tables.timeseries.DataFrameFromJson(data)
      print 'ROWS TO APPEND TO DATAFRAME: {}'.format(len(timeseries))
      pandas_sqlite.InsertOrReplaceRecords(con, 'timeseries', timeseries)
      ts_df = pandas.read_sql('SELECT * FROM timeseries', con, parse_dates = ['timestamp'])
      print 'LENGTH OF NEW DATAFRAME AFTER DB WRITE: {}'.format(len(ts_df))

  worker_pool.Process = Process

If I do this and run soundwave with:

bin/soundwave -d 70 --processes 1 timeseries -b system_health.common_desktop

I get:

3392 test paths found!
ChromiumPerf/chromium-rel-win10/system_health.common_desktop/cpu_time_percentage_avg/browse_accessibility_tech/browse_accessibility_tech_codesearch
Fetching data of 3392 timeseries: ROWS TO APPEND TO DATAFRAME: 249
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 1
.ROWS TO APPEND TO DATAFRAME: 250
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 2
.ROWS TO APPEND TO DATAFRAME: 246
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 3
.ROWS TO APPEND TO DATAFRAME: 244
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 4
.ROWS TO APPEND TO DATAFRAME: 248
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 5
.ROWS TO APPEND TO DATAFRAME: 242
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 6
.ROWS TO APPEND TO DATAFRAME: 248
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 7
.ROWS TO APPEND TO DATAFRAME: 243
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 8
.ROWS TO APPEND TO DATAFRAME: 248
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 9
.ROWS TO APPEND TO DATAFRAME: 245
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 10
.ROWS TO APPEND TO DATAFRAME: 220
LENGTH OF NEW DATAFRAME AFTER DB WRITE: 11

(Note that the expected behavior is that "length of new dataframe after DB write" increases by "rows to append to dataframe" each time.)

I suspect that the problem lies somewhere in pandas_sqlite.InsertOrReplaceRecords, but I'm not sure exactly what it is.

@perezju

perezju commented 6 years ago

Strange. Looking at this now.