GoogleCloudPlatform / cortex-data-foundation

Data Foundation - Google Cloud Cortex Framework
https://cloud.google.com/solutions/cortex
Apache License 2.0
161 stars 87 forks source link

No forecast data after running Demand Sensing pipeline #22

Closed m-naude closed 1 year ago

m-naude commented 1 year ago

Hi, I've gone through the user guide of Google Cloud Cortex Demand Sensing to fill the Demand Forecast, Planning and Promotion Calendar table with test data. I've had no errors during the process, the Cloud build and Vertex AI ran successfully, but none of the tables are populated. The settings used are the default ones. The _demand_sensingevaluations and preprocess tables have data. Also, the _errorsvalidation tables has entries but all the errors are same, except for the id value: No rows with null targets were found for the time series with id: C0000038__1000054. Forecasts start on the first row of a time series (ordered by time) with a null value in the target column....

Any advice would be appreciated.

Please note that we have successfully implemented Cortex Foundation with test data.

benschuler commented 1 year ago

Hi @m-naude, Thanks for bringing this to our attention and for providing some details on the steps you took. Can you please let us know if during your Data Foundation deployment you used client 900 or the default client 100 for the test data? Or did you create and load your own set of test data into the tables? You mentioned that you have implemented Data Foundation with test data, but when you deploy the Demand Sensing pipeline did you chose to do so with or without test data (Step 3.1 in the Demand Sensing deployment user guide)? Best regards, Benjamin

m-naude commented 1 year ago

Hi there, Thanks for the response. We run the pipelines with test data. Initial run was with client 100 and then another run was with for client 900, just to be on the safe side. Regards, Meurant

benschuler commented 1 year ago

Hi @m-naude, I was trying to reproduce the situation in a clean environment internally to clarify where exactly you see your issue.

Here are the steps I took: 1) Deploy Cortex data foundation with test data for client 900 --> After this step, I can see the different BQ datasets and tables/views, but no tables or views related to demand sensing, which is expected 2) Deploy Cortex Demand Sensing via Google Cloud marketplace, indicating that I want to use test data here as well --> Once the deployment is done, I can see additional tables created in the CDC dataset for Demand_Plan, Demand_Forecast and Promotion_Calendar 3) Vertex AI pipeline executes successfully --> In BigQuery, in the dataset VERTEX_AI_DATA I can see a table starting with predictions_ which has 65 entries, which is in line with the expected results for 13 weeks of forecast for 5 products that are being processed. 4) I also checked the errors_validation table and have 5 records in there, all for customer ID ending with 55 and a similar error message to the one that you shared.

Can you confirm that those are the same steps that you took?

If yes, can you specify how many records you see in the errors_validation table? If you have 10 records in there, that could be an indication that some input data is missing in your deployment. In that case, can you please let me know if the tables Demand_Plan, Demand_Forecast and Promotion_Calendar exist in your CDC dataset and how many records they each have?

m-naude commented 1 year ago

Hi there, I thought I'd start fresh as well and followed the same steps as you. Here are the steps I took:

  1. Deploy Cortex data foundation with test data for client 900 --> After this step, I can see the different BQ datasets and tables/views, but no tables or views related to demand sensing, which is expected. I can see client 900 data
  2. Deploy Cortex Demand Sensing via Google Cloud marketplace, indicating that I want to use test data here as well --> Once the deployment is done, I can see additional tables created in the CDC dataset for Demand_Plan, Demand_Forecast and Promotion_Calendar but they are empty.
  3. Vertex AI pipeline executes successfully --> In BigQuery, in the dataset VERTEX_AIDATA there is no table starting with predictions
  4. I also checked the errors_validation table --> 10 records in there, all for customer ID ending with 45 and 55, same error.

Questions:-

Thanks for your help thus far. It is very much appreciated.

benschuler commented 1 year ago

The steps that you took make sense and are comparable to what I did, but with a different outcome, so let's figure out what might have gone wrong.

My assumption then is that because tables Demand_Plan, Demand_Forecast and Promotion_Calendar are empty, I get no data. Correct?

Yes, the pipeline can only produce results for periods where all data is available. So if there is no record in Demand_Plan and Promotion_Calendar, the results will be empty.

In your Cortex Foundation deployment, did you use one or two projects?

During my testing I only used one project for everything. I will try to replicate this scenario with two different projects, to be sure that this is not causing any issues. Just to confirm your setup in case you had two projects: The demand sensing pipeline would be deployed to the target project, so the one that has the BigQuery reporting views in it. Does that match your setup?

And with Demand Sensing did you just use the default parameters?

Yes, I used all default parameters for the demand sensing deployment.

Can you please confirm which tables are being used for demand sensing? I could be missing data.

The tables that we are using for the pipeline are:

[dataset].[table]
CDC_PROCESSED.Demand_Forecast
CDC_PROCESSED.Demand_Plan
CDC_PROCESSED.holiday_calendar
CDC_PROCESSED.Promotion_Calendar
CDC_PROCESSED.trends
CDC_PROCESSED.weather_weekly
+ a combination of reporting views around SalesOrders, CustomersMD, MaterialsMD etc. to get historical sales numbers

My location is australia-southeast1. Would be a problem?

:mag: I think that this might have been the clue we needed. Let me confirm with the team if my suspicion is correct and if so how we can fix that for this specific location. In the meantime, can you try running through the process once more but with a different location that is suitable for you? If you deploy the demand sensing with sample data in a different location and the tables for Demand_Plan, Promotion_Calendar and Demand_Forecast are populated, you might also be able to copy over the data from these tables into your tables in the australia_southeast1 location and then run the pipeline in the location you originally planned to do.

I'll reply back to this thread once I get confirmation from the team about the location-specific issue and provide some next steps.

m-naude commented 1 year ago

Hi there, Any feedback on this issue yet? Is location the problem? Thanks

benschuler commented 1 year ago

Hi @m-naude, Yes, it seems like the location needs to be fixed in the deployment script for this specific location. I have asked the team to provide me an update on the progress for this one and will let you know an ETA for the fix once I can share it. In the meantime, have you tried running the deployment in a different location, like australia-southeast2? As far as I can tell, that location should not be affected by this bug, so it might be an option for you to test the deployment with sample data.

benschuler commented 1 year ago

Hi @m-naude, I just got confirmation that the updated container for the demand sensing pipeline is available on the marketplace, which means that you can run another deployment in australia-southeast1 and check if the sample data gets created as expected. Please let us know if this latest version has resolved your issue and you were able to succesfully deploy the demand sensing pipeline.

m-naude commented 1 year ago

Hi there, I redeployed the new version. Happy to let you know that tables Demand_Plan, Demand_Forecastand Promotion_Calendar now have entries. Also there was no errors_validation table generated for the new deployment. Thank you for your assistance.