argoproj / argo-rollouts

Progressive Delivery for Kubernetes
https://argo-rollouts.readthedocs.io/
Apache License 2.0
2.72k stars 848 forks source link

Experiment schedule for regression testing #2415

Open alexef opened 1 year ago

alexef commented 1 year ago

Summary

Allow a standalone Experiment to have a schedule (same as a CronJob) so that the same experiment could run over and over again.

Use Cases

We would use it for regression testing or production instances. At times, even though there is no code change (so no deployment), changes in Data can lead to the application misbehaving.

Being able to run the same golden path experiment and learn from its AnalysisRun results would increase confidence in our systems.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

zachaller commented 1 year ago

I have to think about this a little bit more, but I think we at Intuit could also possibly use something like this. We have played around with PoC's of running analysis queries on schedules as well. This would be one step more in spinning up pods for the experiment so yea.... thinking

zachaller commented 1 year ago

I have just been randomly thinking about this in the context of implementation. Take this all with a grain of salt just kind of dumping some thoughts. So if we decided to add this feature a few things I am not a huge fan of is having to reimplement the feature set of upstream k8s cron jobs. With that in mind I was just playing around with some ideas on how to do this.

I think it would need to be a new controller something along the lines of ExperimentCronJob. That could in theory inhernet upstream k8s CronJob spec along with the ExperimentSpec. I would then play around with the idea of actually just creating a native k8s cronjob that pulls the Experiment spec from it's parent. This would mean we would have to create a new docker image and code that would call k8s api to create the experiments. This is just one idea.

The other idea is to more tightly couple the two together which would mean reimplementing upstream k8s features to a degree. We would not have to cover all the features but some things come to mind like concurrencyPolicy and time zones etc. I am open for other thoughts still on how this would be implemented.

I also think it is worth bringing up at a community meeting as well.

zachaller commented 1 year ago

One other question do you purposely want to use experiments, or would being able to run analysis runs on say the current production setup also solve your use case?

I suppose one differentiator with experiments is you could have an analysis job that actually hits the experiments endpoints vs say just querying side effects of production traffic .

alexef commented 1 year ago

thank you for the insights. thinking a bit more about it, for regression testing, a recurrent AnalysisRun would be enough

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity.