DominoMeter / DominoMeterAddin-HCLMSP

Collects usage of Domino servers running in the HCL MSP program
GNU General Public License v3.0
0 stars 0 forks source link

JEDI views #18

Open dpastov opened 3 years ago

dpastov commented 3 years ago

Sure - what I mean is that you should parse the line that was stored by DominoMeter which lists the server name, and if it has "running" in it, treat that as a "running" server (which is what we want because it means Jedi has been able to verify the HTTP stack is up using netmonitor.nsf/test locally to Domino.

Otherwise, if it shows "starting up", then we have some problem with the Domino web site document, Internet site document, or "force HTTP authentication" is set for everything. And we need to have some checkboxes in columns that we will develop to help further debug these servers.

dpastov commented 3 years ago

Hi Dmytro,

To help us debug why so many servers are in "startup up" state (which is a big problem since Jedi then doesn't know whether Domino's HTTP stack was ever operational), please add two new views under the Jedi main category, called

HTTP test to internal IP - to the property in blue below, followed by the URL in the red property HTTP test to hostname - same as above logic, but for the green + red one below

This will be a new collection point in DominoMeter that attempts to do a very simple TCP open to the above 2 endpoints.

These tests can / should ideally be a very simple TCP open with exceptions caught and suppressed... just like you are doing for testing whether Jedi is working on the specific TCP port (usually 1910).

If the test works, please store the result. Otherwise, please store the error (404 or TCP open error or whatever) -- just suppress exceptions on the console please.

you can run the same test using telnet like "telnet 172.17.1.45 80"

https://www.thomas-krenn.com/en/wiki/Check_TCP_Port_80_(http)_with_telnet

note: I don't see anywhere in the partitions.xml where the TCP port is defined. I guess assume TCP 80 until we figure out otherwise where Jedi is storing this. I hope it is not hard-coded in Jedi.

Please also add columns in these views that shows the monitors. related entries -- one per column. The very important one is recovery.softrestarts and recovery.restart_if_down, but we need columns for all of these.

from partitions.xml example:

<hostinfo>
    <property name="ip">
       <property-value>172.17.1.45</property-value>
    </property>
    <property name="hostname">
       <property-value>domino-45.prominic.net</property-value>
    </property>
 </hostinfo>

please check other partition.xml results and if I missed any properties below, add them to the columns so the columns in the view contain all the ones you find:

   <property name="monitors.http.period">
     <property-value>60</property-value>
   </property>
   <property name="monitors.timer">
     <property-value>360</property-value>
   </property>
   <property name="notifications.email.enable">
     <property-value>false</property-value>
   </property>
   <property name="monitors.rpc.startupwait">
     <property-value>120</property-value>
   </property>
   <property name="monitors.schedules.enable">
     <property-value>false</property-value>
   </property>
   <property name="recovery.softrestarts">
     <property-value>true</property-value>
   </property>
   <property name="monitors.activity.startupwait">
     <property-value>120</property-value>
   </property>
   <property name="recovery.restart_if_down">
     <property-value>false</property-value>
   </property>
   <property name="monitors.http.startupwait">
     <property-value>180</property-value>
   </property>
   <property name="monitors.http.testpage">
     <property-value>/netmonitor.nsf/test</property-value>
   </property>
   <property name="monitors.http.recoverywait">
     <property-value>60</property-value>
   </property>
dpastov commented 3 years ago

I have to follow up - probably alrasdy done.