IBM / cp4d-monitors

Additional monitors for IBM Cloud Pak for Data, which can be deployed either using the IBM Cloud Pak Deployer, or using regular command line commands. The monitors adhere to the zen-watchdog interface and will be added to the Events and Alerts of Cloud Pak for Data. These monitors focus more on the functional usage of IBM Cloud Pak for Data and can be used to provide more insights into how the Cloud Pak for Data platform is used by its business users.
https://ibm.github.io/cp4d-monitors
Apache License 2.0
4 stars 5 forks source link

[CPD4.6.2] Deploy global connection monitor failed #22

Closed chenja2000 closed 1 year ago

chenja2000 commented 1 year ago

TS012386351: https://ibmsf.lightning.force.com/lightning/r/Case/5003p00002m2zCGAAY/view

I followed doc https://ibm.github.io/cp4d-monitors/monitors/cp4d-platform-global-connections/manual to deploy teh monitor. Image pushed successfully, cj created and running fine, but the cj pod exited with errors.

[root@api.cpd4jc.cp.fyre.ibm.com ~]# oc get cm |grep connection config-wdp-connect-connection 5 12d cp4d-platform-global-connections-1-ca 1 7m33s cp4d-platform-global-connections-1-global-ca 1 7m33s cp4d-platform-global-connections-1-sys-config 1 7m33s zen-alert-cp4d-platform-global-connections-monitor-extension 1 36s zen-connection-extension-points 1 12d [root@api.cpd4jc.cp.fyre.ibm.com ~]#

[root@api.cpd4jc.cp.fyre.ibm.com ~]# oc logs zen-watcher-6df9859cf5-6gq6m |tail -n10
time="2023-03-15 19:37:03" level=info msg="AddonController.runWorker: processing next item"
time="2023-03-15T19:37:03Z" level=info msg="Dropping \"zen/jobs-api\" out of the queue: no pod found for service jobs-api"
time="2023-03-15T19:37:03Z" level=info msg="AddonController.processNextWorkItem: start"
time="2023-03-20 15:37:39" level=info msg=processConfigData event="adding extensions from zen-alert-cp4d-platform-global-connections-monitor-extension to the database"
time="2023-03-20 15:37:48" level=info msg=watchConfigMap event="config zen-alert-cp4d-platform-global-connections-monitor-extension added"
time="2023-03-20 17:39:59" level=info msg=DeleteConfigMap event="received delete event for configMap zen-alert-cp4d-platform-global-connections-monitor-extension"
time="2023-03-20 17:39:59" level=info msg=processHandlerForMultipleExtensions arguments="[zen-alert-cp4d-platform-global-connections-monitor-extension]" event="processing action: delete for extensions retrieved by condition:source = ?"
time="2023-03-20 17:39:59" level=info msg=processHandlerForMultipleExtensions event="processing action: delete for extension:zen_alert_monitor_cp4d-platform-global-connections"
time="2023-03-20 17:49:47" level=info msg=processConfigData event="adding extensions from zen-alert-cp4d-platform-global-connections-monitor-extension to the database"
time="2023-03-20 17:49:55" level=info msg=watchConfigMap event="config zen-alert-cp4d-platform-global-connections-monitor-extension added"
[root@api.cpd4jc.cp.fyre.ibm.com ~]

```[root@api.cpd4jc.cp.fyre.ibm.com ~]# oc get bc
NAME                               TYPE     FROM   LATEST
cp4d-platform-global-connections   Docker   Git    1

[root@api.cpd4jc.cp.fyre.ibm.com ~]# oc get cj |grep global
cp4d-platform-global-connections-cronjob   */15 * * * *   False     0        7m50s           117m

Cron job pod failed on my cluster:```

oc logs -f cp4d-platform-global-connections-cronjob-27988955-lrqdx
Starting Cloud Pak for Data Platform connections monitor....

Executing CP4D Monitor python file: /cp4d-monitoring-scripts/cp4d_platform_global_connections.py
Got error for the command: "/opt/app-root/bin/cpdctl config context set default --username=admin --password=WSyIqucvo03p --url https://ibm-nginx-svc --output json"
Please refer to the response for details: Command "set" is deprecated, Functionality of contexts is replaced by profiles. More information: https://github.com/IBM/cpdctl#configuration
Got error to create cpd context.
Exiting Cloud Pak for Data Platform connections monitor....

My customer followed the same doc, got different errors:

>oc logs cp4d-platform-global-connections-cronjob-pjngp
...
Starting Cloud Pak for Data Platform connections monitor....
Executing CP4D Monitor python file: /cp4d-monitoring-scripts/cp4d_platform_global_connections.py
Found Cloud Pak for Data Platform Assets Catalog Id: 5f62d9a9-84c0-49a2-b648-8c21a2ad9d1e
Found 2 Global Platform Connections.
Testing Resource Postgres_DB with asset_id: 3aabc3b7-7d58-4e9a-9996-7cd85f9ca821
Testing Result code: 400
Testing Result content: {"trace":"ewc6oljlne3ttqfkwk9dynf3m","errors":[{"code":"validation_failed","message":"The connection requires personal credentials which have not been set for this user.","more_info":"Set the credential properties and resubmit the request.","extra":{"environment_name":"icp4data","http_status":400,"id":"CDICO9032E","source_cluster":"NULL","source_component":"wdp-connect-connection","timestamp":"2023-03-17T12:34:23.056Z","user":"1000330999"}}]}
{"trace":"ewc6oljlne3ttqfkwk9dynf3m","errors":[{"code":"validation_failed","message":"The connection requires personal credentials which have not been set for this user.","more_info":"Set the credential properties and resubmit the request.","extra":{"environment_name":"icp4data","http_status":400,"id":"CDICO9032E","source_cluster":"NULL","source_component":"wdp-connect-connection","timestamp":"2023-03-17T12:34:23.056Z","user":"1000330999"}}]}
Testing Resource Watson Query with asset_id: acc5fa36-2f23-4754-a57d-f21ffcd63115
Testing Result code: 400
Testing Result content: {"trace":"dp0axroeocvwjrdrbyubutujc","errors":[{"code":"validation_failed","message":"One of the properties [access_token, api_key, password, username] is required.","more_info":"null","extra":{"environment_name":"icp4data","http_status":400,"id":"CDICO9007E","source_cluster":"NULL","source_component":"wdp-connect-connection","timestamp":"2023-03-17T12:34:24.264Z","user":"1000330999"}}]}
{"trace":"dp0axroeocvwjrdrbyubutujc","errors":[{"code":"validation_failed","message":"One of the properties [access_token, api_key, password, username] is required.","more_info":"null","extra":{"environment_name":"icp4data","http_status":400,"id":"CDICO9007E","source_cluster":"NULL","source_component":"wdp-connect-connection","timestamp":"2023-03-17T12:34:24.264Z","user":"1000330999"}}]}
Sending events to zen-watchdog:
[{"monitor_type": "cp4d_platform_global_connections", "event_type": "global_connections_count", "metadata": "global_connections_count=2", "severity": "info", "reference": "Cloud Pak for Data Global Connections Count"}, {"monitor_type": "cp4d_platform_global_connections", "event_type": "global_connection_valid", "metadata": "global_connection_valid=0", "severity": "warning", "reference": "Global Connection - Postgres_DB"}, {"monitor_type": "cp4d_platform_global_connections", "event_type": "global_connection_valid", "metadata": "global_connection_valid=0", "severity": "warning", "reference": "Global Connection - Watson Query"}]
Response status_code: 400
Response content: {"message":"invalid event type or monitor type. error retrieving alert type for event with monitor type cp4d_platform_global_connections and event type global_connections_count","status":400}
-------------------------------------------------------------
Exiting Cloud Pak for Data Platform connections monitor....
chenja2000 commented 1 year ago

@arthurlaimbock is helping on this issue.

arthurlaimbock commented 1 year ago

In the end it was quite a change. Up to CP4D 4.5 new monitor types are always accepted, allowing you to post new events. However, since 4.6, an additional check is made, and new events are rejected by default, unless they are registered. This is however where CP4D development made a crucial mistake. They "misuse" the monitor type for both the name of the cronjob that is generated and the monitor_type property of the event. Now unfortunately these 2 items have very different restrictions to their naming conventions:

So the current way around is to only allow a-z characters for the monitor type

Then the cpdctl tool has an extreme fast deprecation timeline. Where the cpdcptl context was still valid in 4.5, in 4.6 you must use cpdctl profile and the use context will immediately throw an error.