cloud-native-toolkit / multi-tenancy-gitops

Provides our opinionated point of view on how GitOps can be used to manage the infrastructure, services and application layers of K8s based systems
https://cloudnativetoolkit.dev/adopting/use-cases/gitops/gitops-ibm-cloud-paks/
Apache License 2.0
113 stars 729 forks source link

Fixing the Process Mining setup #242

Closed rikgig closed 2 years ago

rikgig commented 2 years ago

Process mining is suffering issues with DB2 and Task Mining.

With these version updates for the catalog, operator and latest of the instance, all seems to be good. On my cluster now, all is up and running including the connection between Task Mining -> DB2 which I never saw working before.

hollisc commented 2 years ago

@jesusmah. from our conversation earlier, you were also working on some updates for the process-mining recipe due to the issue with DB2. Can you please review and see if this aligns with what you have done to get it working.

jesusmah commented 2 years ago

Hey @rikgig thanks a lot for taking the time to get PM to work and sorry for not updating it myself due to lack of time. I was about to update it by end of this week. However, let me comment the following on your PR as I believe that it mixes the fix and the workaround for previous PM version...

Previous PM version (1.12.0.3) shipped with an embedded DB2 version that does not work well on OpenShift (or at least ROKS). As a result, the workaround was to create a standalone DB2 instance from the DB2 Operator that gets installed with PM (or that you manually install) and then configure Task Mining to connect and use that external DB. That would allow you to successfully install PM+TM in ROKS. In your PM I can see you have implemented that customization to the PM instance that is getting created. However, this is not needed in the newer PM 1.12.0.4 version that you have also updated your PM instance to (the embedded DB2 version now works with ROKS). Also, customizing your PM instance to get TM to work with an external DB as opposed to the embedded DB means that you need to create a DB2 instance beforehand and that is not implemented in this PR as far as I could see.

I see you have also updated the catalog versions PM requires to function on the last 1.12.0.4 version, which is good. That was what I did first to test the new 1.12.0.4 PM version. However, I have recently figured that all those individual catalogs for PM and DB2 (and really any IBM operator such as IAF that is also needed by PM) should be merged and included in the main IBM catalog:

image: icr.io/cpopen/ibm-operator-catalog:latest

So I understand the intention from the PM team on their instructions on the KC to have people define the individual catalogs for each of the dependencies of PM as opposed to the main general IBM Operator Catalog so that they tie any PM installation to the very specific version for each of its dependencies. Versions PM team has made sure their product work fine with and has gone through thorough testing. However, I believe that operators and catalogs should be backwards compatible so that if I have on my system the latest version of the main general IBM Operators catalog I should get installed the latest PM Operator and that Operator should allow me to install PM 1.12.0.4, 1.12.0.3, etc to a certain extent at least instead of requiring the very specific catalogs (down to their specific sha). That is my expectation as a potential IBM client and what I believe we should work towards for a simple, manageable and gitops-able systems. So this is the other piece of work I want to do with @hollisc, simplifying the catalogs and versions so that all IBM capabilities come from the same catalog (just like IBM agreed to place all containerized software under the same image registry --> icr.io) and users only need to decide what version of PM, MQ or APIC they want to get installed when they create such instances as opposed to going through a complex engineering process whereby they make sure they have the correct very specific versions of catalogs. Does it make sense what Im saying? So this requires some refactoring of the actual GitOps framework.

For reference, here you can see the work I've done to make sure of all of the above in one QuickStart environment I requested from techzone --> https://github.com/test-pm-db2-7/multi-tenancy-gitops/commit/22716e90cadb1bf754561d1ad9e825b0387d497e

There are other pieces to consider in this work such as installing the IAF operator namespaced and use the main and general IBM Operator catalog for Common Services as well to follow the same thinking explained above. This two things might very well introduce collateral issue with other capabilities of the framework and the reason why we have not yet delivered those changes.

Finally, I've seen that the latest PM version introduces a new dependency on the IBM Redis Operator that we might as well need to do some work on to onboard such operator into the GitOps framework.

@rikgig feel free to ping @hollisc and I on Slack if you want to further discuss/comment on what I think the work required is for updating PM tutorial to work, which is what I explained above and what Im trying to get done this week early next week, plus also update the instructions on the lab and quickstart accordingly at the same time as we release the updated in the GitOps Framework.

Thanks

rikgig commented 2 years ago

Hi @jesusmah Very in depth analysis, excellent work! I didn't do that. You're probably right, I've applied a mix of both but not knowing. I've had some instructions from a PM guy and applied the fixes one at a time in GitOps to get it working.

In fact, I patched DB2 first because it was plainly not working. But then the PM didn't start so it had to be patched too. And yes I did create a DB2 cluster using another yaml that wasn't part of the PR since that PM-tech-guy (Gicacomo) told me it wasn't necessary with the new version. I didn't do any testing from scratch since, my apologies for this.

About the "customization" for the DB, I would rather call this an instalment specific configuration. I guess (maybe wrong) that PM does support other DB than DB2? If so, I fail to see how a system installed on OCP could work with this config lacking. Defaults are good don't get me wrong, but in my opinion those should not too deeply hidden. The entry point of the config must be visible or at least very well documented so that "we" know!

I totally agree with you concerning the catalogs and versioning, to some extent. Versioning of cloud-native system components is not simple. And with the cloud pak structure containing a common-layer, that can be a very serious upgrade challenge. Nobody will ever update version constantly of all cloud paks simultaneously. And at that point, "common" stuff versions may collide. So targeted versions of certain components will be necessary I'm sure, but to the sha level it's abusive.

So yes, what you say makes complete sense and yes, the GitOps framework needs some refactoring at the version management level to expose it in a clean way so that live tests will be possible with actually pulling out version of operators that may be shared, or could break other functions.

We can leave that PR there for now and wait for your obviously more serious work to have this thing working. Or it could be merged temporarily to have the setup working, your call since I consider this part to be under your guidance ;).

Thanks a lot for those explanations!

@hollisc , we definitely need your advice and point of view in the matter to.

hollisc commented 2 years ago

Discussed with Jesus and Eric. Will close this PR as the fix is in a separate one