cmu-db / ottertune

The automatic DBMS configuration tool
Other
1.21k stars 311 forks source link

Sort aggregated matrix columns by knob/metric name #388

Closed dvanaken closed 4 years ago

dvanaken commented 4 years ago

The aggregate_data method assumes the knob/metric JSON data stored in the [Knob|Metric|Pipeline]Data.data field is sorted which is sometimes violated and causing this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/app/website/website/tasks/async_tasks.py", line 939, in configuration_recommendation
    pipeline_metrics = combine_workload(target_data)
  File "/app/website/website/tasks/async_tasks.py", line 775, in combine_workload
    y_columnlabels, target_data['y_columnlabels'])
Exception: ('The workload and target data should have identical y columnlabels (sorted metric names)', array(['global.awr_begin_snap', 'global.awr_dbid', 'global.awr_end_snap',
       ..., 'raw_db_time', 'transaction_counter', 'elapsed_time'],
      dtype='<U106'), ['global.awr_begin_snap', 'global.awr_dbid', 'global.awr_end_snap', 'global.capture_id',

This assumption is not necessary - it's better to just sort the matrix columns in the aggregate_data method.

dvanaken commented 4 years ago

Yes that's what was causing the error