Closed anniesullie closed 5 years ago
I don't remember anything very useful --
Both code paths call find_change_points.FindChangePoints(), but find_anomalies.py does some other things and only gets called if ProcessTest gets called from the add_point_queue handler when the new point is added.
Right, it's the call to find_change_points.FindChangePoints()
with the different x values that's the difference, but it's not clear to me why that method would behave differently depending on x values.
Have you verified that the x values make the difference by trying adding test methods to find_change_points_test? If having different x values influences whether a change point is found by that function, I that would be good news, because then I think that means there's a bug in find_change_points that we could find and fix.
My initial thought was that maybe there was an error in somewhere in a request to the add_point_queue handler.
I haven't had time to update the unit tests. I verified by poking around for that specific test in /_ah/stats/shell:
test_key = utils.TestKey('ClankInternal/health-plan-clankium-phone/memory.top_10_mobile/memory_allocator_gpu_gpu_process/foreground')
test = test_key.get()
rows = find_anomalies.GetRowsToAnalyze(test, 300)
config = {} # This is the actual value
data_series = [(row.revision, row.value) for row in rows]
change_points = find_change_points.FindChangePoints(data_series, **config)
print change_points
chart_series = debug_alert._ChartSeries(rows)
change_points = debug_alert.SimulateAlertProcessing(chart_series, **config)
print change_points
If I try that and change debug_alert.SimulateAlertProcessing
to find_change_points.FindChangePoints
then both behave the same -- SimulateAlertProcessing
calls FindChangePoints
multiple times with different sub-series, to simulate the way that FindChangePoints is supposed to be called repeatedly every time a new point is added.
I believe that FindChangePoints
doesn't find anything with the series with the last 300 rows because there are multiple steps in that range, and FindChangePoints doesn't really handle the case of multiple change points in the input series -- IIRC, usually it's given the series starting from after the last alert up until the latest point, up to a certain number of points.
Just did another little experiment:
from dashboard import debug_alert
from dashboard import find_anomalies
from dashboard import find_change_points
from dashboard import utils
test_key = utils.TestKey(
'ClankInternal/health-plan-clankium-phone/memory.top_10_mobile'
'/memory_allocator_gpu_gpu_process/foreground')
test = test_key.get()
rows = find_anomalies.GetRowsToAnalyze(test, 150)
print find_change_points.FindChangePoints([(row.revision, row.value) for row in rows])
print find_change_points.FindChangePoints([(i, row.value) for i, row in enumerate(rows)])
print debug_alert.SimulateAlertProcessing([(row.revision, row.value) for row in rows])
print debug_alert.SimulateAlertProcessing([(i, row.value) for i, row in enumerate(rows)])
Output:
[]
[]
[ChangePoint(x_value=398731L, ...)]
[ChangePoint(x_value=73, ...)]
This suggests that x value doesn't affect whether a change point is found when calling FindChangePoints. Also, calling FindChangePoints with the last 150 points of that series (which contains exactly one change point!) yields nothing, which may be a bug.
Let's archive this and let @dave-2 evaluate the E-divisive algorithm and other alternatives.
I was investigating why we did not get an improvement alert here: https://chromeperf.appspot.com/report?sid=dd45f838005395dd0a5806734194fe8bf4d66bec31f78e9b92d49c6949a7d136&start_rev=397075&end_rev=400154
The debug page says that we should get an alert. I dug into the differences between the two codepaths since both call
find_change_points.FindChangePoints()
with the same config ({} in this case). It seems that thefind_anomalies
version sends in the timeseries as:[(row.revision, row.value) for row in rows]
And the
debug_alert
version sends in the timeseries as:[(i, r.value) for i, r in enumerate(rows)]
Anyone want to take a look? @wangzhen127 who poked at this code 1.5 years ago, @dave-2 @Apeliotes @eakuefner who may be interested. @qyearsley in case he happens to remember any place to check.