HariSekhon / Nagios-Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
https://www.linkedin.com/in/HariSekhon
Other
1.13k stars 507 forks source link

Cloudera plugins in Check_MK #224

Closed marbaa closed 4 years ago

marbaa commented 5 years ago

Hi,

It will be pretty hard to explain the issue, because of highly custom solution which we use. I'm using check_clouderahealth and check_clouderastatus perl plugins in Check_MK as mrpe checks. Example mrpe.cfg check line: /location/of/check_mk/plugins/cloudera-checks/check_cloudera_manager_health.pl -H hostname -u user -p password -C "TEST CLUSTER" -S hive --tls-noverify.

It works fine, I get output: OK: cluster 'TEST CLUSTER' service 'hue' health=GOOD

Next, we have perl script for creating entries in Oracle database of HP ticketing tool. That script is taking a lot of arguments and one is output of the cloudera plugin. It might be simple mistake somewhere, but to me it looks like that output of checkclouderaplugin is containing some hidden character or something like that nature. Our perl script is giving error: `DBD::Oracle::st execute failed: ORA-00917: missing comma (DBD ERROR: error possibly near <> indicator at char 1040 in Additional Info: CRIT - CRITICAL: cluster '<*>TEST CLUSTER' service 'hive' health=BAD')` From somewhere, it is loading

<*>

before the cluster name.

Part of perl script which is loading arguments:

my $ticketServicemanagerInsertServiceAlarm = $dbhServiceMananger->prepare("
                                                        INSERT INTO  ".$SERVICEMANAGER_DATABASE_TABLE_EVENT_OUT_NAME."
                                                                (EVSTATUS, EVTYPE, BRIEF_DESCRIPTION, ASSIGNMENT, CATEGORY, SUBCATEGORY1, PROBLEM_SHORTNAME, PRIORITY_CODE, DOWNTIME_START,  REPORTED_BY, REPORTED_LASTNAME,REPORTED_FIRSTNAME, CONTACT_NAME, CONTACT_LASTNAME, CONTACT_FIRSTNAME, CONTACT_PHONE, REMEDY_NO, RESOLUTION_CODE,LOGICAL_NAME, OWNERSHIP, EVENTREG_NAME, OPENED_BY, CAUSE_CODE, CRITICALITY, SERVICE_RESTRICTION, ACTION)
                                                        VALUES
                                       ('new','".$SERVICEMANAGER_EVENT_TYPE."','MONITORING/ALERT/ ".$nagioshostgroup." - ".$nagioshostname." ".$nagiosservicename." is ".$notificationtype."','".$assignmentgroup."','SOFTWARE','OPEN SYSTEMS','".$customerlocation."','".$prioritycode."', to_date('".$oracledate."', 'YY/MM/DD HH24:MI:SS'),'','', '', '', '', '', '', '".$REMEDY_NO."', '', '".$nagioshostname."', '1', '".$SERVICEMANAGER_INTERFACE_USER."','".$SERVICEMANAGER_INTERFACE_USER."', '', 'medium','none','***** Monitoring alert *****|Service: ".$nagiosservicename."|Host: ".$nagioshostname."|Projekt: ".$nagioshostgroup."|Address: ".$nagiosipaddress."|State: ".$notificationtype."|Additional Info: ".$nagiosserviceoutput."')");
                                                $ticketServicemanagerInsertServiceAlarm->execute or die $ticketServicemanagerInsertServiceAlarm->err_str;

Script is ending with executing err_str, because (apparantely) parsed output from cloudera plugin is not just plain "TEST CLUSTER" but it is "<*>TEST CLUSTER".

Maybe it is problem with newlines in output of command:

# /.../cus_plugins/cloudera-checks/check_cloudera_manager_health.pl -H hostname -u user -p password --list-clusters --tls-noverify
CM clusters available:

cluster name         => CDH version

TEST CLUSTER            => CDH5
HariSekhon commented 5 years ago

You can try re-running the plugin with the same args on the command line and piping it through cat -A to see any non-printable characters it may be outputting.

marbaa commented 5 years ago

No non-printable characters shown. Just pure text.

HariSekhon commented 5 years ago

I think this is an environment issue or something relating to the bespoke process you guys have as I've not come across this before and without access to your environment I cannot reproduce it.

marbaa commented 5 years ago

[hostname] # /.../cloudera-checks/check_cloudera_manager_health.pl -H server -u user -p password --list-clusters --tls-noverify | cat -A CM clusters available:$ $ cluster name => CDH version$ $ TEST CLUSTER => CDH5$ Key Trustee Server Cluster => CDH5$ [hostname] #

Yes, I understand. We will try to investigate further.

marbaa commented 4 years ago

I forgot to write update here, but my colleague was able to fix this by some workaround in our environment scripts. By using sed and removing the part '<*>', everything works. Unfortunately, can't find the part of code.

HariSekhon commented 4 years ago

Ok thanks for letting me know, I'm going to leave this marked as an environment issue as I don't believe it is the plugin code causing this.