Open luiztauffer opened 3 years ago
Hi @luiztauffer - thanks for the feedback!
If you want to see a production-ready implementation of a similar pipeline for Amazon Forecast in Python, you can look at the "Improving Forecast Accuracy with Machine Learning" solution, which you can deploy in one click from here.
The solution itself is written in Python, open source, and hosted on github.
If you want to launch the solution directly into your AWS account, follow the first link above and click "Launch in AWS Console".
Hi @pwrmiller thank you for sharing this info, I've been reading through the code this morning and found already interesting information there! I'm not interested in the full implementation just yet, since there are some particularities to the service I'm trying to build that I would like to have more control over, so I feel it would be better to build the small blocks myself, little by little.
I am using Step/Lambda functions to, for example, create DatasetGroups/Datasets/ImportJobs/Predictors/Forecasts/ExportJobs. Some of these take long times to be completed and the next steps need to wait. That's simple to implement with just a loop and sleep in a python script, but I would rather not use any constantly processing resource (which costs money).
Ideally, I would have (1) an Event triggered whenever, for example, the DATASET_IMPORT_JOB creation is finished, but I couldn't discover a way of doing that so far. Right now, I am trying to (2) build a loop on Step functions between a Wait task (say, 5 min) and a check_status lambda function, which returns the current state of the given arn.
Between solutions (1) and (2), would you have any recommendations? I am not much familiar with cloud formation, so I couldn't figure out if something similar is implemented in the repo you shared.
Thanks for the clarification!
As you've noted, one of the important things to remember is that AWS Lambda Functions can run only up to 15 minutes (and are currently charged for runtime in 1ms increments). This means that you can't have a Python function that loops and sleeps, but rather need a function that can be called periodically to check on the state of a resource, only moving to the next step once the desired state is met (or abandoning the workflow if it has not been, either due to a timeout, or service error for example).
To me it also seems like you're considering if an event-driven approach for monitoring forecasts is possible. As of today, Amazon Forecast publishes CloudWatch events (which record API call information), but this does not tell you when a resource transitions state (or even what resource was accessed - it's more in line with "this API was called at this time"). So, for now, having functions that check for desired state run in a step function (like the solution) is the only way to go, which is implemented in the solution I linked you to.
If you want to see a different implementation of the same sort of workflow, the solution is based off of some earlier work published here that might be more illustrative if you want to build your own customized pipeline.
thanks a lot! I'll check it out
Hi, thanks for this work! I'm trying to build a similar structure, but with Python code. The times for creation of some objects (training models, creating import jobs, etc...) on Forecast can be quite long, and to automatize this pipeline, the future functions will have to wait for the former functions to be executed. My main question is: how exactly do you make the next lambda function to be triggered only when a specific prior step is concluded?