Closed StCyr closed 5 years ago
OK, it doesn't seem that bad
$quality is computed as follow:
if($info->{'FIDELITY'} > 1){
$quality = ((($now-$info->{'LCOME'})/86400) + ($info->{'QUALITY'}*$info->{'FIDELITY'}))/(($info->{'FIDELITY'})+1);
}else{
# We increment the number of visits
$quality = (($now-$info->{'LCOME'})/86400);
}
So, the comparisons high criticised here above make sense in the end.
The problem is different.
I've an idea, but I'm still digging....
Here are the IP discover related server's logs for the subnet "10.4.0.0":
support@houdini:/var/log/ocsinventory-server$ for i in $(seq 1 3);do zgrep ipdiscover activity.log.$i.gz | egrep -e "(better|over)\(10.4.0.0\)";done
Mon Nov 26 16:39:54 2018;7623;1001;PC-MARIANNE-2011-05-04-18-07-21;10.4.6.51;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;over(10.4.0.0)
Mon Nov 26 16:42:10 2018;8354;1001;PC-NATHALIE-2017-08-25-10-49-53;10.4.6.2;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)
Mon Nov 26 16:43:36 2018;8355;1001;EFFICYWIN10-2018-03-19-15-44-32;10.4.1.233;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)
Mon Nov 26 17:09:35 2018;7623;1001;ITSM-DASHBOARD-2018-11-26-15-23-36;10.4.3.67;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)
Mon Nov 26 18:31:56 2018;8355;1001;VM10POWERBI-2018-10-25-11-35-07;10.4.1.11;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.4.0.0)
Wed Nov 28 11:41:20 2018;22630;1001;VMDESKTOP-05-2014-09-17-15-30-42;10.4.6.240;OCS-NG_WINDOWS_AGENT_v2.1.0.1;ipdiscover;over(10.4.0.0)
Wed Nov 28 11:49:37 2018;22629;1001;VMDESKTOP-11-2018-03-20-09-24-15;10.4.6.242;OCS-NG_WINDOWS_AGENT_v2.1.0.1;ipdiscover;better(10.4.0.0)
Wed Nov 28 11:51:09 2018;19764;1001;ITSM-DASHBOARD-2018-11-26-15-23-36;10.4.3.67;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)
Wed Nov 28 16:30:09 2018;19751;1001;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)
And, here's the quality and fildelity of these agents as of now:
AGENT | QUALITY | FIDELITY |
---|---|---|
LAPTOP-EVELYNE | 0,0346 | 10 |
ITSM-DASHBOARD | NA | NA (agent cannot be found in the 'devices' table anymore) |
VMDESKTOP-11 | 0,0827 | 2967 |
VMDESKTOP-05 | 0,0838 | 11020 |
VM10POWERBI | 0,0682 | 355 |
EFFICYWIN10 | 0,0899 | 6825 |
PC-NATHALIE | 0,0973 | 4439 |
PC-MARIANNE | 0,1694 | 17625 |
I don't really understand what has happened: The logs don't show enough information.
All I can see is that the computer LAPTOP-EVELYNE (the one which triggered my investigation) probably got a very good quality because it connected a lot of times initially:
support@houdini:/var/log/ocsinventory-server$ for i in $(seq 1 3);do zgrep LAPTOP-EVELYNE activity.log.$i.gz | grep prolog;done
Wed Nov 28 16:05:30 2018;22630;103;LAPTOP-EVELYNE-2018-11-28-16-05-30;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;new_deviceid
Wed Nov 28 16:05:30 2018;22630;100;LAPTOP-EVELYNE-2018-11-28-16-05-30;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:06:27 2018;19751;100;LAPTOP-EVELYNE-2018-11-28-16-05-30;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:28:33 2018;22631;103;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;new_deviceid
Wed Nov 28 16:28:33 2018;22631;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:29:59 2018;19763;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:34:00 2018;22634;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 16:34:08 2018;19763;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Wed Nov 28 20:33:44 2018;19764;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Thu Nov 29 00:33:13 2018;24090;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
Thu Nov 29 04:32:43 2018;19764;100;LAPTOP-EVELYNE-2018-11-28-16-28-33;10.4.6.65;OCS-NG_WINDOWS_AGENT_v2.3.0.0;prolog;accepted
This, because quality is calculated as follow:
if($info->{'FIDELITY'} > 1){
$quality = ((($now-$info->{'LCOME'})/86400) + ($info->{'QUALITY'}*$info->{'FIDELITY'}))/(($info->{'FIDELITY'})+1);
}else{
# We increment the number of visits
$quality = (($now-$info->{'LCOME'})/86400);
}
Which means new machines connecting frequently initially have good chances to get a good quality.
But, though, the election process should protect against such "putsches" as the difference of quality with the already elected agents must also be greater than "OCS_OPT_IPDISCOVER_BETTER_THRESHOLD" for a new agent to be elected.
I'm guessing this protection is not good enough in my case (I'm using the default of "1 days".
By the way, the documentation is not correct: IPDISCOVER_BETTER_THRESHOLD should not be expressed in days: It's a difference of agent quality, not a difference in days.
Configuration options | Meaning |
---|---|
IPDISCOVER_BETTER_THRESHOLD | Specify the minimum difference in days to replace an ipdiscover agent. |
Should become
Configuration options | Meaning |
---|---|
IPDISCOVER_BETTER_THRESHOLD | Specify the minimum difference of quality to replace an ipdiscover elected agent. |
I also suggest to improve the communication server's logging:
from:
&_log(1001,'ipdiscover',($over?'over':'better')."($_->{IPSUBNET})") if $ENV{'OCS_OPT_LOGLEVEL'};
to
&_log(1001,'ipdiscover',($over?'over':'better')."($_->{IPSUBNET})"."(OLD=($worth[0],$worth[1]),NEW=($DeviceID,$quality))") if $ENV{'OCS_OPT_LOGLEVEL'};
Hmmm, I've updated my communication server's logging as described in my previous comment, and it seems it doesn't take into account the IPDISCOVER_BETTER_THRESHOLD setting:
Thu Nov 29 17:19:22 2018;31560;1001;LAPTOP-ARNAUDD-2018-11-29-13-39-13;10.4.6.244;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.4.0.0)(OLD=(7814,0.0536),NEW=(7822,0.0268))
Mon Dec 3 10:31:33 2018;21828;1001;WIN7-AD-TEST-2013-11-27-11-07-24;10.4.1.230;OCS-NG_WINDOWS_AGENT_v2.1.0.1;ipdiscover;better(10.4.0.0)(OLD=(7830,0.1763),NEW=(1245,0.1748))
Mon Dec 3 10:49:05 2018;21825;1001;VMDESKTOP-STUDE-2018-06-19-17-54-19;10.4.6.241;OCS-NG_WINDOWS_AGENT_v2.1.0.1;ipdiscover;better(10.4.0.0)(OLD=(1245,0.1748),NEW=(7096,0.0819))
Mon Dec 3 11:06:20 2018;21826;1001;VMDESKTOP10CONS-2018-03-16-13-40-42;10.4.3.130;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)(OLD=(7096,0.0819),NEW=(6722,0.0794))
Mon Dec 3 11:46:32 2018;21827;1001;ITSM-DASHBOARD-2018-11-26-15-23-36;10.4.3.67;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)(OLD=(6722,0.0794),NEW=(7792,0.0785))
Mon Dec 3 12:28:52 2018;23358;1001;VM10POWERBI-2018-10-25-11-35-07;10.4.1.11;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.4.0.0)(OLD=(7792,0.0785),NEW=(7620,0.0669)
I also wonder if the initial quality of an agent wouldn't be better computed based on the PROLOG_FREQ
setting rather than the interval between 2 connections. So, something like:
$quality = ENV{OCS_OPT_PROLOG_FREQ}*2;
rather than:
$quality = (($now-$info->{'LCOME'})/86400);
That would avoid initial agents to get too optimistic quality when someone launches it manualy several times within a short interval.
Ok, so, it looks like the check against OCS_OPT_IPDISCOVER_BETTER_THRESHOLD isn't working:
Here are my logs:
support@houdini:/var/log/ocsinventory-server$ zcat /var/log/ocsinventory-server/activity.log.1 | grep OLD
Thu Dec 6 10:45:15 2018;12862;1001;LAPTOP-EVELYNE-2018-11-29-17-02-59;10.50.6.120;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.11.0.0)(OLD=(7897,0.7002),NEW=(7830,0.1335))()
Thu Dec 6 11:20:49 2018;13005;1001;LAPTOP-GREGORY-2018-12-06-09-37-28;10.4.2.92;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(172.30.125.208)(OLD=(7887,0.0398),NEW=(7896,0.0338))()
Thu Dec 6 11:40:10 2018;12864;1001;LAPTOP-RAPHAEL-2018-10-18-14-52-13;10.50.6.123;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.11.0.0)(OLD=(7803,0.2660),NEW=(7588,0.1825))()
Thu Dec 6 11:57:13 2018;13005;1001;LAPTOP-GERTD-2018-12-06-11-55-26;10.4.3.200;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(172.30.125.208)(OLD=(7890,0.0397),NEW=(7903,0.0005))()
Thu Dec 6 12:06:19 2018;14202;1001;LAPTOP-NICOLASK-2018-12-06-12-05-08;10.4.5.152;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(172.30.125.208)(OLD=(7896,0.0338),NEW=(7904,0.0029))()
Thu Dec 6 12:07:20 2018;12864;1001;LAPTOP-NICOLASL-2018-12-06-12-05-48;10.4.5.141;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(172.30.125.208)(OLD=(7904,0.0029),NEW=(7905,0.0005))()
Thu Dec 6 12:08:30 2018;12865;1001;LAPTOP-NICOLASK-2018-12-06-12-05-08;10.4.5.152;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;better(10.4.0.0)(OLD=(7889,0.0359),NEW=(7904,0.0025))()
Thu Dec 6 12:49:29 2018;12861;1001;LAPTOP-YLIEN-2018-10-12-09-36-02;10.50.6.124;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.5.0.0)(OLD=(7756,0.2687),NEW=(7552,0.2266))()
Thu Dec 6 13:50:38 2018;14196;1001;LAPTOP-GERT-2018-11-13-13-12-55;10.50.1.103;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.5.0.0)(OLD=(7552,0.2266),NEW=(7710,0.1232))()
Thu Dec 6 14:53:56 2018;14196;1001;LAPTOP-YLIEN-2018-10-12-09-36-02;10.50.6.124;OCS-NG_WINDOWS_AGENT_v2.4.0.0;ipdiscover;better(10.5.0.0)(OLD=(7906,0.2647),NEW=(7552,0.2260))()
The interesting parts are the empty brackets "()" at the end of each lines: It should display the value of the OCS_OPT_IPDISCOVER_BETTER_THRESHOLD environment variable. But it doesn't.
Here's my config and the part of the code I've modified to show the OCS_OPT_IPDISCOVER_BETTER_THRESHOLD environment variable in the logs:
mysql> select * from config where NAME='IPDISCOVER_BETTER_THRESHOLD';
+-----------------------------+--------+--------+---------------------------------------------------------------+
| NAME | IVALUE | TVALUE | COMMENTS |
+-----------------------------+--------+--------+---------------------------------------------------------------+
| IPDISCOVER_BETTER_THRESHOLD | 2 | | Specify the minimal difference to replace an ipdiscover agent |
+-----------------------------+--------+--------+---------------------------------------------------------------+
1 row in set (0.00 sec)
# If not over, we compare our quality with the one of the worth on this subnet.
# If it is better more than one, we replace it
if(@worth){
if(($quality < $worth[1] and (($worth[1]-$quality)>$ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'})) or $over){
# Compare to the current and replace it if needed
if(!$dbh->do('UPDATE devices SET HARDWARE_ID=? WHERE HARDWARE_ID=? AND NAME="IPDISCOVER"', {}, $DeviceID, $worth[0])){
return 1;
}
$dbh->commit;
&_log(1001,'ipdiscover',($over?'over':'better')."($_->{IPSUBNET})"."(OLD=($worth[0],$worth[1]),NEW=($DeviceID,$quality))($ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'})") if $ENV{'OCS_OPT_LOGLEVEL'};
return 0;
}
}
Proceeding on my investigations, I've found that $ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'} is empty.
Here's a sample of a customized logging:
Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;OCS_OPT_TRACE_DELETED 0
Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;REQUEST_URI /ocsinventory
Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;OCS_OPT_IPDISCOVER_BETTER_THRESHOLD
Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;GATEWAY_INTERFACE CGI/1.1
Mon Dec 17 14:48:51 2018;26737;1001;LAPTOP-JONI-2018-12-17-14-45-36;10.4.6.162;OCS-NG_WINDOWS_AGENT_v2.3.0.0;ipdiscover;OCS_OPT_DOWNLOAD_TIMEOUT 30
Ok, so, the problem is due to IPDISCOVER_BETTER_THRESHOLD being defined as a TVALUE in Server/System/Config.pm:
IPDISCOVER_BETTER_THRESHOLD => {
type => 'TVALUE',
default => 1,
unit => 'day',
description => 'Specify the minimal difference to replace an ipdiscover agent',
level => IMPORTANT,
filter => qr '^(\d+(?:,\d+)?)$'
},
while it's defined as an IVALUE in ocsreports/files/ocsbase.sql:
('IPDISCOVER_BETTER_THRESHOLD',1,'','Specify the minimal difference to replace an ipdiscover agent')
I don't know which one is the correct one :-/
hi @charleneauger, @guimard,
Could you please tell me what's the correct fix for this issue?
Is IPDISCOVER_BETTER_THRESHOLD a TVALUE or an IVALUE?
(See my previous comment)
BR,
Hi,
Thx to @StCyr and his contribution which resolve this problem.
Regards, Gilles Dubois.
General informations
PHP Version :7.2.10 Web Server :Apache/2.4.29 (Ubu Database Server :(Ubuntu) version 5.7.24-0ubuntu0.18.04.1-log Version OCSReports:2.5
Problem's description
I've often asked myself why new computers seems to be preferred for IP discovery election.
This time I took the time to investigate OCSI server's IP discovery election mechnism, and things look weird at 1st sight:
Things seem to occur in
sub _ipdiscover_evaluate
.There, the process seems to be the following:
First, if an elected agent hasn't be seen by the communication server for more than IPDISCOVER_MAX_ALIVE, then directly replace it by the agent currently processed:
So far, so good. But, when the condition above isn't met, things look weird.
if the condition above isn't met, the communication server looks for the elected agent with the highest quality:
@worth = ( $row->{'ID'}, $row->{'QUALITY'} ) if $worth[1] < $row->{'QUALITY'};
This is already strange: Why is it looking for the elected agent with the highest quality? Shouldn't it be looking for the worst quality?!? Isn't the goal to replace badly behaving elected agent by a better one?
Finally, the elected agent with the highest quality is replaced by the current processed agent if the quality of the currently processed agent is lower than the quality of the elected agent (taking account the
OCS_OPT_IPDISCOVER_BETTER_THRESHOLD
environment setting):if(($quality < $worth[1] and (($worth[1]-$quality)>$ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'}))
That seems completely wrong to me!?!
I believe, we should be trying to replace the elected agent with the lowest quality.
So, it should be:
@worth = ( $row->{'ID'}, $row->{'QUALITY'} ) if $worth[1] < $row->{'QUALITY'};
and
if(($quality > $worth[1] and (($quality-$worth[1])>$ENV{'OCS_OPT_IPDISCOVER_BETTER_THRESHOLD'}))
Am I totally wrong?