GoogleCloudPlatform / cortex-data-foundation

Data Foundation - Google Cloud Cortex Framework
https://cloud.google.com/solutions/cortex
Apache License 2.0
162 stars 88 forks source link

Data Discrepancy in Model Training and Resulting Predictions #49

Open mazv-arshad opened 8 months ago

mazv-arshad commented 8 months ago

In the model details provided, there's a noticeable inconsistency between the data used for training and the resulting predictions. Specifically, when examining the tables K9_PROCESSING.preprocess_2024_02_12T10_14_25_836Z and K9_PROCESSING.predictions_2024_02_12T02_15_56_827Z_479, it becomes apparent that the training data spans from 02/01/2017 to 19/12/2022, while the resulting predictions only cover the period from 07/03/2022 to 30/05/2022.

Ideally, the forecast data should extend beyond the training data range to ensure comprehensive analysis and accuracy in predictions. This discrepancy raises concerns about the reliability and completeness of the model's predictions.

Furthermore, the table identified as "Demand Forecast" in the Demand Sensing user guide also exhibits the same limited date range of 07/03/2022 to 30/05/2022. This consistency across multiple tables underscores the need for further investigation into the data processing pipeline or model training methodology.

PFA

Screenshot 2024-02-20 at 12 52 59 Screenshot 2024-02-20 at 12 56 10 Screenshot 2024-02-20 at 12 58 14 Screenshot 2024-02-20 at 13 02 35 Screenshot 2024-02-20 at 13 02 57
vladkol commented 8 months ago

Hi @mazv-arshad, from what I can see, you are deployed Demand Sensing with test data option. It appears that Demand_Plan table in our test data is too old (you can find one in CDC dataset). While we are working on a fix, you can adjust dates in that table so that they match the correct period for prediction. Thank you!