databricks-demos / dbdemos

Demos to implement your Databricks Lakehouse
Other
255 stars 80 forks source link

uc-04-system-tables - are [cost_upper] & [cost_lower] mixed up? #57

Closed evogelpohl closed 10 months ago

evogelpohl commented 11 months ago

Note the view creation code in the [02-forecast-billing-tables] notebook. I think you criss-crossed the upper and lower columns as it plots the upper as lower, and lower as upper.

If not, I may be mistaken in my understanding of how forecasted upper & lower boundaries work.

select
  `date`,
  workspace_id,
  dbus,
  sku,
  y as past_list_cost, -- real
  y is null as is_prediction ,
  coalesce(y, yhat) as list_cost, -- contains real and forecast
  yhat as forecast_list_cost,  -- forecast
  **yhat_lower as forecast_list_cost_upper,
  yhat_upper as forecast_list_cost_lower,**
  -- smooth upper and lower bands for better visualization
  avg(yhat) OVER (PARTITION BY sku, workspace_id ORDER BY date ROWS BETWEEN 7 PRECEDING AND 7 FOLLOWING) AS forecast_list_cost_ma, 
  avg(yhat_lower) OVER (PARTITION BY sku, workspace_id ORDER BY date ROWS BETWEEN 7 PRECEDING AND 7 FOLLOWING) AS forecast_list_cost_upper_ma,
  avg(yhat_upper) OVER (PARTITION BY sku, workspace_id ORDER BY date ROWS BETWEEN 7 PRECEDING AND 7 FOLLOWING) AS forecast_list_cost_lower_ma,
  upper_anomaly_alert,
  lower_anomaly_alert,
  on_trend,
  training_date
from
  forecasts;
QuentinAmbard commented 10 months ago

ahaha good catch thanks! I think I never checked the names because it works "visually". I'm updating the query, it'll be packaged in the upcoming release, thanks for reporting that!