OCSInventory-NG / OCSInventory-Server

Communication server of OCS Inventory
http://www.ocsinventory-ng.org/
GNU General Public License v2.0
353 stars 139 forks source link

IPDISCOVER_BETTER_THRESHOLD ignored during IP discover elections #154

Closed StCyr closed 5 years ago

StCyr commented 5 years ago

General informations

PHP Version :7.2.10 Web Server :Apache/2.4.29 (Ubu Database Server :(Ubuntu) version 5.7.24-0ubuntu0.18.04.1-log Version OCSReports:2.5

Problem's description

I've often asked myself why new computers seems to be preferred for IP discovery election.

This time I took the time to investigate OCSI server's IP discovery election mechnism, and things look weird at 1st sight:

Things seem to occur in sub _ipdiscover_evaluate.

There, the process seems to be the following:

First, if an elected agent hasn't be seen by the communication server for more than IPDISCOVER_MAX_ALIVE, then directly replace it by the agent currently processed:

        # If we find an ipdiscover that is older than IP_MAX_ALIVE, we replace it with the current
        if( (($time - $row->{'LAST'}) > $max_age) and $max_age){
          @worth = ($row->{'ID'}, $row->{'QUALITY'} );
          $over = 1;
          last;
        }

So far, so good. But, when the condition above isn't met, things look weird.

if the condition above isn't met, the communication server looks for the elected agent with the highest quality:

@worth = ( $row->{'ID'}, $row->{'QUALITY'} ) if $worth[1] < $row->{'QUALITY'};

This is already strange: Why is it looking for the elected agent with the highest quality? Shouldn't it be looking for the worst quality?!? Isn't the goal to replace badly behaving elected agent by a better one?

Finally, the elected agent with the highest quality is replaced by the current processed agent if the quality of the currently processed agent is lower than the quality of the elected agent (taking account the OCS_OPT_IPDISCOVER_BETTER_THRESHOLD environment setting):

if(($quality < $worth[1] and (($worth[1]-$quality)>$ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'}))

That seems completely wrong to me!?!

I believe, we should be trying to replace the elected agent with the lowest quality.

So, it should be:

@worth = ( $row->{'ID'}, $row->{'QUALITY'} ) if $worth[1] < $row->{'QUALITY'};

and

if(($quality > $worth[1] and (($quality-$worth[1])>$ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'}))

Am I totally wrong?

StCyr commented 5 years ago

OK, it doesn't seem that bad

$quality is computed as follow:

    if($info->{'FIDELITY'} > 1){
      $quality = ((($now-$info->{'LCOME'})/86400) + ($info->{'QUALITY'}*$info->{'FIDELITY'}))/(($info->{'FIDELITY'})+1);
    }else{
      # We increment the number of visits
      $quality = (($now-$info->{'LCOME'})/86400);
    }

So, the comparisons high criticised here above make sense in the end.

The problem is different.

I've an idea, but I'm still digging....

StCyr commented 5 years ago

Here are the IP discover related server's logs for the subnet "10.4.0.0":

support@houdini:/var/log/ocsinventory-server$ for i in $(seq 1 3);do zgrep ipdiscover activity.log.$i.gz | egrep -e "(better|over)\(10.4.0.0\)";done
Mon Nov 26 16:39:54 2018;7623;1001;PC-MARIANNE-2011-05-04-18-07-21;10.4.6.51;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;over(10.4.0.0)
Mon Nov 26 16:42:10 2018;8354;1001;PC-NATHALIE-2017-08-25-10-49-53;10.4.6.2;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)
Mon Nov 26 16:43:36 2018;8355;1001;EFFICYWIN10-2018-03-19-15-44-32;10.4.1.233;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)
Mon Nov 26 17:09:35 2018;7623;1001;ITSM-DASHBOARD-2018-11-26-15-23-36;10.4.3.67;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)
Mon Nov 26 18:31:56 2018;8355;1001;VM10POWERBI-2018-10-25-11-35-07;10.4.1.11;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.4.0.0)
Wed Nov 28 11:41:20 2018;22630;1001;VMDESKTOP-05-2014-09-17-15-30-42;10.4.6.240;OCS-NG_WINDOWS_AGENT_v2.1.0.1;ipdiscover;over(10.4.0.0)
Wed Nov 28 11:49:37 2018;22629;1001;VMDESKTOP-11-2018-03-20-09-24-15;10.4.6.242;OCS-NG_WINDOWS_AGENT_v2.1.0.1;ipdiscover;better(10.4.0.0)
Wed Nov 28 11:51:09 2018;19764;1001;ITSM-DASHBOARD-2018-11-26-15-23-36;10.4.3.67;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)
Wed Nov 28 16:30:09 2018;19751;1001;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)

And, here's the quality and fildelity of these agents as of now:

AGENT QUALITY FIDELITY
LAPTOP-EVELYNE 0,0346 10
ITSM-DASHBOARD NA NA (agent cannot be found in the 'devices' table anymore)
VMDESKTOP-11 0,0827 2967
VMDESKTOP-05 0,0838 11020
VM10POWERBI 0,0682 355
EFFICYWIN10 0,0899 6825
PC-NATHALIE 0,0973 4439
PC-MARIANNE 0,1694 17625
StCyr commented 5 years ago

I don't really understand what has happened: The logs don't show enough information.

All I can see is that the computer LAPTOP-EVELYNE (the one which triggered my investigation) probably got a very good quality because it connected a lot of times initially:

support@houdini:/var/log/ocsinventory-server$ for i in $(seq 1 3);do zgrep LAPTOP-EVELYNE activity.log.$i.gz | grep prolog;done
Wed Nov 28 16:05:30 2018;22630;103;LAPTOP-EVELYNE-2018-11-28-16-05-30;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;new_deviceid
Wed Nov 28 16:05:30 2018;22630;100;LAPTOP-EVELYNE-2018-11-28-16-05-30;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:06:27 2018;19751;100;LAPTOP-EVELYNE-2018-11-28-16-05-30;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:28:33 2018;22631;103;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;new_deviceid
Wed Nov 28 16:28:33 2018;22631;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:29:59 2018;19763;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:34:00 2018;22634;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:34:08 2018;19763;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 20:33:44 2018;19764;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Thu Nov 29 00:33:13 2018;24090;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Thu Nov 29 04:32:43 2018;19764;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted

This, because quality is calculated as follow:

   if($info->{'FIDELITY'} > 1){
      $quality = ((($now-$info->{'LCOME'})/86400) + ($info->{'QUALITY'}*$info->{'FIDELITY'}))/(($info->{'FIDELITY'})+1);
    }else{
      # We increment the number of visits
      $quality = (($now-$info->{'LCOME'})/86400);
    }

Which means new machines connecting frequently initially have good chances to get a good quality.

But, though, the election process should protect against such "putsches" as the difference of quality with the already elected agents must also be greater than "OCS_OPT_IPDISCOVER_BETTER_THRESHOLD" for a new agent to be elected.

I'm guessing this protection is not good enough in my case (I'm using the default of "1 days".

StCyr commented 5 years ago

By the way, the documentation is not correct: IPDISCOVER_BETTER_THRESHOLD should not be expressed in days: It's a difference of agent quality, not a difference in days.

Configuration options Meaning
IPDISCOVER_BETTER_THRESHOLD Specify the minimum difference in days to replace an ipdiscover agent.

Should become

Configuration options Meaning
IPDISCOVER_BETTER_THRESHOLD Specify the minimum difference of quality to replace an ipdiscover elected agent.
StCyr commented 5 years ago

I also suggest to improve the communication server's logging:

from:

&_log(1001,'ipdiscover',($over?'over':'better')."($_->{IPSUBNET})") if $ENV{'OCS_OPT_LOGLEVEL'};

to

&_log(1001,'ipdiscover',($over?'over':'better')."($_->{IPSUBNET})"."(OLD=($worth[0],$worth[1]),NEW=($DeviceID,$quality))") if $ENV{'OCS_OPT_LOGLEVEL'};

StCyr commented 5 years ago

Hmmm, I've updated my communication server's logging as described in my previous comment, and it seems it doesn't take into account the IPDISCOVER_BETTER_THRESHOLD setting:

Thu Nov 29 17:19:22 2018;31560;1001;LAPTOP-ARNAUDD-2018-11-29-13-39-13;10.4.6.244;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.4.0.0)(OLD=(7814,0.0536),NEW=(7822,0.0268))
Mon Dec  3 10:31:33 2018;21828;1001;WIN7-AD-TEST-2013-11-27-11-07-24;10.4.1.230;OCS-NG_WINDOWS_AGENT_v2.1.0.1;ipdiscover;better(10.4.0.0)(OLD=(7830,0.1763),NEW=(1245,0.1748))
Mon Dec  3 10:49:05 2018;21825;1001;VMDESKTOP-STUDE-2018-06-19-17-54-19;10.4.6.241;OCS-NG_WINDOWS_AGENT_v2.1.0.1;ipdiscover;better(10.4.0.0)(OLD=(1245,0.1748),NEW=(7096,0.0819))
Mon Dec  3 11:06:20 2018;21826;1001;VMDESKTOP10CONS-2018-03-16-13-40-42;10.4.3.130;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)(OLD=(7096,0.0819),NEW=(6722,0.0794))
Mon Dec  3 11:46:32 2018;21827;1001;ITSM-DASHBOARD-2018-11-26-15-23-36;10.4.3.67;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)(OLD=(6722,0.0794),NEW=(7792,0.0785))
Mon Dec  3 12:28:52 2018;23358;1001;VM10POWERBI-2018-10-25-11-35-07;10.4.1.11;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.4.0.0)(OLD=(7792,0.0785),NEW=(7620,0.0669)

I also wonder if the initial quality of an agent wouldn't be better computed based on the PROLOG_FREQ setting rather than the interval between 2 connections. So, something like:

$quality = ENV{OCS_OPT_PROLOG_FREQ}*2;

rather than:

$quality = (($now-$info->{'LCOME'})/86400);

That would avoid initial agents to get too optimistic quality when someone launches it manualy several times within a short interval.

StCyr commented 5 years ago

Ok, so, it looks like the check against OCS_OPT_IPDISCOVER_BETTER_THRESHOLD isn't working:

Here are my logs:

support@houdini:/var/log/ocsinventory-server$ zcat /var/log/ocsinventory-server/activity.log.1 | grep OLD
Thu Dec  6 10:45:15 2018;12862;1001;LAPTOP-EVELYNE-2018-11-29-17-02-59;10.50.6.120;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.11.0.0)(OLD=(7897,0.7002),NEW=(7830,0.1335))()
Thu Dec  6 11:20:49 2018;13005;1001;LAPTOP-GREGORY-2018-12-06-09-37-28;10.4.2.92;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(172.30.125.208)(OLD=(7887,0.0398),NEW=(7896,0.0338))()
Thu Dec  6 11:40:10 2018;12864;1001;LAPTOP-RAPHAEL-2018-10-18-14-52-13;10.50.6.123;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.11.0.0)(OLD=(7803,0.2660),NEW=(7588,0.1825))()
Thu Dec  6 11:57:13 2018;13005;1001;LAPTOP-GERTD-2018-12-06-11-55-26;10.4.3.200;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(172.30.125.208)(OLD=(7890,0.0397),NEW=(7903,0.0005))()
Thu Dec  6 12:06:19 2018;14202;1001;LAPTOP-NICOLASK-2018-12-06-12-05-08;10.4.5.152;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(172.30.125.208)(OLD=(7896,0.0338),NEW=(7904,0.0029))()
Thu Dec  6 12:07:20 2018;12864;1001;LAPTOP-NICOLASL-2018-12-06-12-05-48;10.4.5.141;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(172.30.125.208)(OLD=(7904,0.0029),NEW=(7905,0.0005))()
Thu Dec  6 12:08:30 2018;12865;1001;LAPTOP-NICOLASK-2018-12-06-12-05-08;10.4.5.152;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)(OLD=(7889,0.0359),NEW=(7904,0.0025))()
Thu Dec  6 12:49:29 2018;12861;1001;LAPTOP-YLIEN-2018-10-12-09-36-02;10.50.6.124;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.5.0.0)(OLD=(7756,0.2687),NEW=(7552,0.2266))()
Thu Dec  6 13:50:38 2018;14196;1001;LAPTOP-GERT-2018-11-13-13-12-55;10.50.1.103;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.5.0.0)(OLD=(7552,0.2266),NEW=(7710,0.1232))()
Thu Dec  6 14:53:56 2018;14196;1001;LAPTOP-YLIEN-2018-10-12-09-36-02;10.50.6.124;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.5.0.0)(OLD=(7906,0.2647),NEW=(7552,0.2260))()

The interesting parts are the empty brackets "()" at the end of each lines: It should display the value of the OCS_OPT_IPDISCOVER_BETTER_THRESHOLD environment variable. But it doesn't.

So, for some reason the communication server doesn't seem to know this environment variable.

Here's my config and the part of the code I've modified to show the OCS_OPT_IPDISCOVER_BETTER_THRESHOLD environment variable in the logs:

mysql> select * from config where NAME='IPDISCOVER_BETTER_THRESHOLD';
+-----------------------------+--------+--------+---------------------------------------------------------------+
| NAME                        | IVALUE | TVALUE | COMMENTS                                                      |
+-----------------------------+--------+--------+---------------------------------------------------------------+
| IPDISCOVER_BETTER_THRESHOLD |      2 |        | Specify the minimal difference to replace an ipdiscover agent |
+-----------------------------+--------+--------+---------------------------------------------------------------+
1 row in set (0.00 sec)
      # If not over, we compare our quality with the one of the worth on this subnet.
      # If it is better more than one, we replace it
      if(@worth){
        if(($quality < $worth[1] and (($worth[1]-$quality)>$ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'})) or $over){
          # Compare to the current and replace it if needed
          if(!$dbh->do('UPDATE devices SET HARDWARE_ID=? WHERE HARDWARE_ID=? AND NAME="IPDISCOVER"', {}, $DeviceID, $worth[0])){
            return 1;
          }
          $dbh->commit;
          &_log(1001,'ipdiscover',($over?'over':'better')."($_->{IPSUBNET})"."(OLD=($worth[0],$worth[1]),NEW=($DeviceID,$quality))($ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'})") if $ENV{'OCS_OPT_LOGLEVEL'};
          return 0;
        }
      }
StCyr commented 5 years ago

Proceeding on my investigations, I've found that $ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'} is empty.

Here's a sample of a customized logging:

Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;OCS_OPT_TRACE_DELETED 0
Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;REQUEST_URI /ocsinventory
Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;OCS_OPT_IPDISCOVER_BETTER_THRESHOLD
Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;GATEWAY_INTERFACE CGI/1.1
Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;OCS_OPT_DOWNLOAD_TIMEOUT 30
StCyr commented 5 years ago

Ok, so, the problem is due to IPDISCOVER_BETTER_THRESHOLD being defined as a TVALUE in Server/System/Config.pm:

  IPDISCOVER_BETTER_THRESHOLD => {
    type => 'TVALUE',
    default => 1,
    unit => 'day',
    description => 'Specify the minimal difference to replace an ipdiscover agent',
    level => IMPORTANT,
    filter => qr '^(\d+(?:,\d+)?)$'
  },

while it's defined as an IVALUE in ocsreports/files/ocsbase.sql:

('IPDISCOVER_BETTER_THRESHOLD',1,'','Specify the minimal difference to replace an ipdiscover agent')

I don't know which one is the correct one :-/

StCyr commented 5 years ago

hi @charleneauger, @guimard,

Could you please tell me what's the correct fix for this issue?

Is IPDISCOVER_BETTER_THRESHOLD a TVALUE or an IVALUE?

(See my previous comment)

BR,

gillesdubois commented 5 years ago

Hi,

Thx to @StCyr and his contribution which resolve this problem.

Regards, Gilles Dubois.