Closed arthurdm closed 4 years ago
I would suggest having a single Task / TaskRun CRDs inside we could have different definitions. for example readinessProbe in kubernetes can have completely different set of values depending kind of probe
There's probably pros / cons both ways.
At first a single LibertyOperation
CRD seems attractive because it's simple to use / discover, but as we scale the amount of operations we have (10+) there maybe some operations that are not applicable or available in certain cluster environments, then it would be hard to tell the user the conditions in which the embedded actions are applicable.
Multiple CRDs make it more complex to know which CR to compose, but allows for better specialization and environment-dependent installation / availability.
I think we could perhaps merge the two trace related operations into a single CRD, so we take a hybrid approach here.
kind: LibertyTraceOperation
traceEnabled: true | false
traceSpecification: ...
That was exactly we were discussing with @leochr on Friday one of my proposal was
Kind: LibertyAction
spec:
podName:
otherCommonFields...
action:
trace:
enable: true
traceSpecification:
maxFileSize:
maxFiles:
serverDump:
- heap
- core
status:
operator-sdk makes it almost impossible to listen to multiple CRDs in single controller, so it would require controller per new CRD. Single CRD is simpler overall solution
The Action/ActionRun however might be useful if we wanted to have automatic action discoverability for tools.
I don't recommend we mix stateful actions (such as trace) with stateless / one-time actions (such as server dump). It creates confusion in the usage scenarios.
It's ok to have multiple controllers for these - actually, it becomes even more pluggable in terms of having controllers / CRDs that are environment specific - for example, a CRD that binds to a particular cloud provider (AWS, IBM, Google) storage, etc.
Another common need for support teams is to get the logs from when the server started. At server startup information about the Liberty and Java versions and any startup-time issues are logged.
Perhaps getting that would be just a matter of getting oc logs for the pod. Could there be an action for that as well?
Thanks, Don, we'll give some thought to the startup scenario.
Update: I've got a prototype for trace operation working. Next, we need to add error handling and report the status of the operation as well as optimize/clean-up the code.
Each day-2 operation will report it's status (started/completed/failed). Such information will be held inside the status
field of the CR. Some of that information can be output for oc get openlibertytrace my-app-trace
.
All events of the day2 operation will be logged (e.g "Enabled trace for pod abc"... "Stopped trace for pod abc"). Those events will be shown when oc describe openlibertytrace my-app-trace
is run.
We can also add annotations/labels into the respective resources that the day2 operation is processing. For example, when the trace is enabled, we can add an annotation to the pod itself to reflect just that (e.g. trace.openliberty.io/status
or trace.openliberty.io/enabled
). Such information can be used by kAppNav to show the appropriate options (e.g. stop trace) in its console.
Thanks for the input @leochr - Once we have the status field and event logging implemented in the prototype it's probably worth posting them here to provide better visualization of the path we're going.
That's an interesting thought about the kAppNav integration. One question for @cvignola is whether the kAppNav dashboard has the ability to drill down into an individual pod for a replicated microservice (e.g.: a microservice portfolio
with 3 replica sets) - since the day 2 operations are for a pod, not necessarily the entire replicate set.
@arthurdm I believe the link that kAppNav generates for Kibana includes all of the pods in the deployment -- but the Kibana dashboards certainly have the ability to narrow in to just see the logs from one pod.
@arthurdm So yes, kappnav has the ability to show individual pods if you add Pod to the componentKind list. We also have a podlist function we use at Deployment scope, which could be used to populate a pick list.
@arthurdm @donbourne Yes, the query kAppNav generates form the Kibana dashboard URL for Liberty enumerates all pods belonging to the Deployment. Then like Don said previously, the Kibana dash enables you to move around and narrow your view to a specific pod.
@arthurdm @leochr @arturdzm As I shared with Arthur via slack, we have an opportunity for industry leadership if we solve a problem facing d2ops. The problem is maturity: d2ops don't have it.
Sorely lacking is the ability for d2ops to be discovered and introspected. It must be possible for higher order tools to be created that raise the abstraction level beyond a yaml interface.
Specifically, a tool (e.g. a UI) should be able to:
1) discover installed d2ops 2) be notified when d2ops are installed/uninstalled 3) be able to know which Kind the d2op applies to 4) be able to initiate a d2op against a specific instance 5) be able to determine the input parameters, types, optionality, and defaults 6) be able to process user-supplied input against provided validation rules 8) be able to specify optional "layout hints" for a UI 9) be able to determine when a d2op has completed 10) be able to know whether a d2op completed successfully or in failure 11) be able to find and access a 'd2op log' if applicable to the d2op to reveal details of its operation
Toward addressing those requirements ...
kind: Liberty
spec:
d2ops:
- kind: LibertyAction
kind: LibertyAction
spec:
optarget:
- name: <instance name>
kind: <instance kind>
kind: LibertyAction
spec:
interaction:
- parameter: <parameter name from openapi spec>
optional: true | false
default: <default value>
validation-rule: <regular expression?>
layout-hint: <I'm still thinking about this one ...>
kind: LibertyAction
spec:
status:
completion: <time stamp>
success: true | false
@arthurdm @arturdzm @leochr Do any of you guys know where/if CRD OpenAPI is documented? I have found examples of it being used, but no documentation for the schema anywhere.
@cvignola Some information is documented here: https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#validation and https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.0.md#schemaObject
thanks for the feedback @cvignola
metadata:
name: openlibertyapplications.openliberty.io
annotations:
day2operations: OpenLibertyTrace, OpenLibertyDump
For item 4 we could similarly add annotations to the CRDs of the actions, this way the user (working with a CR) doesn't have to specify it.
metadata:
name: openlibertytraces.openliberty.io
annotations:
targetKinds: Pod
For items 5 & 6 we can add these things to the OAS3 Schema of the CRDs.
I believe a good goal to have is: bake as much "tools helper" information / metadata / schema as we can into the CRDs, and keep the CR (for users) short and optimized.
@arthurdm I concur with your points in https://github.com/OpenLiberty/open-liberty-operator/issues/47#issuecomment-559875437
Delivered dump and trace day-2 operations. Documentation is here (including the operation discovery mechanisms discussed above): https://github.com/OpenLiberty/open-liberty-operator/blob/master/doc/user-guide.md#day-2-operations
As a value-add of the OL Operator (versus the Appsody Operator) we should investigate the use of specialized day-2 operations. Here are some examples. The names / kind may change, but there should be enough information below to start prototypes.
ActionLibertyTraceStart
input from user:
what the operator does: important NEED TO increase memory of container otherwise will get JVM OOM depending on the trace. maybe this ought to be a user input
The customer would then use the app, which will generate trace, and stop it when sufficient trace has been gathered, with the action below:
ActionLibertyTraceStop
input from user:
what the operator does:
ActionLibertyJVMDump
input from user:
what the operator does: