Open epa095 opened 4 years ago
If were to use another service for aggregating logs, I found these while poking around: fluentd and ELK on Kubernetes. I like the idea of having a Kibana dashboard with all the essential deets on broadcast.
With the core question, I like the idea of getting desired models and then grepping , etc.
Maybe @milesgranger have thoughts on this
Problem We build some models, we fail others. But to figure out why a model failed you must find the argo workflow containing that model and look at it. Often the exit-code is enough, other times you must look at the log of the pod. How can we expose this information out of the cluster? Thoughts?
Thoughts
If we agree on the above point, where in k8s should this information be?
model
object for failed models as well containing the status (Failed
), and the exit code of the container. This means thatkubectl get models
dont give working models, but rather desired models, and it can be filtered on the status. gordo-controller can still write some summary-statistic into the gordo (e.g. nr of failed models per exit-code for example), but "the truth" is in themodels
.failed-model
object. But this seems quite weird compared to how other k8s objects are handled.config
dictionary, or maybe better: add another map (for example in thestatus
field) from model-name to exitcode / status? Then the gordo functions as a kind of log. Problems with this: The gordo is already pressed for size, and this will increase it a bit.I guess a core question is: Does
kubectl get models
givedesired
models orsuccessful
models?