fusioninventory / fusioninventory-agent

FusionInventory Agent
http://fusioninventory.org/
GNU General Public License v2.0
255 stars 126 forks source link

Problem discovering ASM diskgroups #797

Closed raulk89 closed 3 years ago

raulk89 commented 4 years ago

Hello

Operating System: RHEL/OEL 7

I have latest version on fusioninventory:

# rpm -qa | grep fusion
fusioninventory-agent-task-inventory-2.5.2-1.el7.x86_64
fusioninventory-agent-2.5.2-1.el7.x86_64

I know, that ASM.pm file has the following command to get asm diskgroup info

my $diskgroups = _getDisksGroups(
        command => "su - grid -c 'asmcmd lsdg'",
        logger  => $logger
    );

I have grid and oracle software installed as oracle:oinstall. So no grid user. The command "asmcmd lsdg" works both:

# asmcmd lsdg
State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512             512   4096  1048576   2047997   562844                0          562844              0             N  DATA/
MOUNTED  EXTERN  N         512             512   4096  1048576     52223    16535                0           16535              0             Y  OCRVOTE/

# su - oracle -c 'asmcmd lsdg'
State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512             512   4096  1048576   2047997   562844                0          562844              0             N  DATA/
MOUNTED  EXTERN  N         512             512   4096  1048576     52223    16535                0           16535              0             Y  OCRVOTE/

So tried modifying this ASM.pm file, by using either one of these two options. Also enabled debug mode, but no errors also.

# fusioninventory-inventory --verbose --debug > inventory.xml
[info] New inventory from host.domain.com-2020-04-15-17-50-59 for local0
...
[debug] Running FusionInventory::Agent::Task::Inventory::Generic::Drives
[debug] Running FusionInventory::Agent::Task::Inventory::Generic::Drives::ASM    <-- Here it actually waits 1-2 seconds, also, which is normal, as the "asmcmd lsdg" command takes the same amount of time, when I test this, but somehow this information is not pushed into inventory.xml file
[debug] Running FusionInventory::Agent::Task::Inventory::Generic::Environment
...

And the inventory.xml file that is produced, there is no diskgroups present, as I have. And nor do I have any errors produces.

Further note: I also tried creating grid user, assigned it to oracle user default group, which is oinstall. But it does not work, since grid software is started as oracle user.

[grid@host~]$ /oracle/19c/grid/bin/asmcmd lsdg
Connected to an idle instance.
ASMCMD-8102: no connection to Oracle ASM; command requires Oracle ASM to run

EDIT: If I modified this "su - ..." command, so that it must produce an error (for example typo on this command), then, with debug mode, I did not get any errors also. Shouldn't it produce some errors then..?

Also I would suggest, that this oracle user (at the moment hardcoded user "grid"), is configurable. It is very common that clusterware user is also oracle - so same as for database installations.

Any help would be appreciated.

Raul

raulk89 commented 4 years ago

Anyone?

Raul

g-bougard commented 4 years ago

Hi @raulk89 sorry for the long delay, it seems your output has 14 columns... and the code expects 13. As I can see, I implemented the parsing from an output without the Logical_Sector column. I'm preparing a fix thanks to your output.

g-bougard commented 4 years ago

Can you try the ASM.pm file from PR #832 ?

raulk89 commented 4 years ago

Hmm, no luck, I guess. I replaced the file from here: https://github.com/fusioninventory/fusioninventory-agent/blob/be6a676d8f23fbd0a99c5caecac73f66d8c3c31b/lib/FusionInventory/Agent/Task/Inventory/Generic/Drives/ASM.pm

I executed: # PATH=$PATH:/usr/sbin;/usr/bin/fusioninventory-agent -l inventory.xml

And I do not see my diskgroups inside inventory.xml file.

Furthermore, seems like this Logical_Sector came from 12c onwards, but I cannot confirm that. What I can confirm, this column is present at 19c. And also I can confirm that 10g or 11.2 - there is no such column present. So total of 13 columns there. So what I mean is, this software should handle total of 13 columns also for backward compatibility 10g, 11g.

My bet is, they added this column since 12c.

Regards Raul

g-bougard commented 4 years ago

Thank you to have test so quickly.

Indeed I think we probably still parsed the output from su - grid .... I just added a test to check the list is not empty before trying oracle user.

My PR checks how many columns it finds and adjust its algo between 13 & 14 columns so don't worry older Oracle version should be supported.

Can you try with the latest ASM.pm ?

raulk89 commented 4 years ago

Hmm, can you provide me with the link. Since I do not see any changes made to ASM.pm file since the first change (an hour ago).

Raul

g-bougard commented 4 years ago

Of course: ASM.pm You should see l.32 & l.36 with a modified test.

raulk89 commented 4 years ago

Indeed, now I get these diskgroups from 19c.

    <DRIVES>
      <FREE>1108002</FREE>
      <LABEL>DATA</LABEL>
      <TOTAL>2047997</TOTAL>
      <VOLUMN>diskgroup</VOLUMN>
    </DRIVES>
    <DRIVES>
      <FREE>28260</FREE>
      <LABEL>MGMT</LABEL>
      <TOTAL>52220</TOTAL>
      <VOLUMN>diskgroup</VOLUMN>
    </DRIVES>
    <DRIVES>
      <FREE>9872</FREE>
      <LABEL>OCRVOTE</LABEL>
      <TOTAL>10236</TOTAL>
      <VOLUMN>diskgroup</VOLUMN>
    </DRIVES>

What I also noticed. From 19c database, it will work with root user and also oracle user (even when ORACLE_HOME points to $DB_HOME). I do not know if this has something to do with RAC (Real Application Clusters) or not.

But with 11.2 database, seems like it does not work with root user at all. (11.2 databases are all single host installments with grid)

# asmcmd lsdg
Connected to an idle instance.
ASMCMD-8102: no connection to Oracle ASM; command requires Oracle ASM to run

# export ORACLE_HOME=/oracle/11.2/grid; export ORACLE_SID=+ASM; $ORACLE_HOME/bin/asmcmd lsdg
Connected to an idle instance.
ASMCMD-8102: no connection to Oracle ASM; command requires Oracle ASM to run

But with oracle user, to make this work, I need to preset some env variables for that.

# su - oracle -c 'export ORACLE_HOME=/oracle/11.2/grid; export ORACLE_SID=+ASM; asmcmd lsdg'
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    215037    94082                0           94082              0             N  DATA/

So out of the box this command on 11.2 does not work: command => "su - oracle -c 'asmcmd lsdg'",

# su - oracle -c 'asmcmd lsdg'
Connected to an idle instance.
sh: /oracle/11.2/db/bin/clsecho: No such file or directory
Can't exec "/oracle/11.2/db/bin/clsecho": No such file or directory at /oracle/11.2/db/lib/asmcmdshare.pm line 494.
Use of uninitialized value $buf in string ne at /oracle/11.2/db/lib/asmcmdshare.pm line 498.

_It is searching these binaries from $DBHOME (oracle/11.2/db/)

From what fusioninventory-agent versions this ASM feature is supported..?

Raul

g-bougard commented 4 years ago

Okay, thank you @raulk89 At least the problem is fixed on 19c databases. For the other case, I need a safe way to find ORACLE_HOME and set it in the request command. I'm not an Oracle admin and I don't have any such environment other the hand. What is funny is the error you reports, it shows Oracle uses perl libraries... maybe Oracle defines a perl environment somewhere during installation. It would be nice if it's easy to find. Do you see anything under /etc folder that could be related to Oracle ? Or can you help by searching for a safe and generic way to set ORACLE_HOME for oracle user ? If I remember well, maybe we even have to search for one or more user and maybe other than grid or oracle.

raulk89 commented 4 years ago

Well there are many possbilities. From googling, there are also some threads: https://rajat1205sharma.wordpress.com/2015/02/16/how-to-find-grid-home-location-in-oracle/

I will discuss some of these.

In all my oracle installments, I am using oracle user for all of these (so no grid user at all). I haven't noticed any other user except oracle or grid. From all I've seen (during the 4 year period as a oracle DBA), theoracle user is used mainly.

Regarding grid home location. Simplest way would be by getting this information from oracle user's env variables which are already preset for the user. In my cases, with oracle user, it is located here /home/oracle/.bash_profile

There is an entry: GRID_HOME=....

For example after su command, this variable is already present in env. So after su command, just do this "export ORACLE_HOME=$GRID_HOME;"

# su - oracle -c 'export ORACLE_HOME=$GRID_HOME; export ORACLE_SID=+ASM; asmcmd lsdg'
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    215037    94082                0           94082              0             N  DATA/

Or from running processes:

# ps -ef | grep d.bin
oracle    2284     1  0 Sep17 ?        01:02:57 /oracle/11.2/grid/bin/ohasd.bin reboot
root      9198 56865  0 20:07 pts/2    00:00:00 /usr/bin/grep d.bin
oracle   10925     1  0 Sep17 ?        00:12:52 /oracle/11.2/grid/bin/cssdagent
oracle   10929     1  0 Sep17 ?        02:21:48 /oracle/11.2/grid/bin/oraagent.bin
oracle   10955     1  0 Sep17 ?        00:02:11 /oracle/11.2/grid/bin/evmd.bin
oracle   10973     1  0 Sep17 ?        00:02:55 /oracle/11.2/grid/bin/ocssd.bin
oracle   11043 10955  0 Sep17 ?        00:00:00 /oracle/11.2/grid/bin/evmlogger.bin -o /oracle/11.2/grid/evm/log/evmlogger.info -l /oracle/11.2/grid/evm/log/evmlogger.log
oracle   11056     1  0 Sep17 ?        00:01:25 /oracle/11.2/grid/bin/tnslsnr LISTENER -inherit

Ofcourse parsing this can be challenging. And also the fact that this is only available if grid is running (but since "asmcmd lsdg" requires grid to run anyway, so ..).

Also, you'll need ASM instance name "ORACLE_SID". For single instance installments, this is most cases (if not all) just "+ASM" This can be verified from running processes also by querying pmon.

# ps -ef | grep asm_pmon
root     10991 56865  0 20:09 pts/2    00:00:00 grep --color=auto asm_pmon
oracle   11159     1  0 Sep17 ?        00:02:57 asm_pmon_+ASM

Another place to get these locations: Before 12.1, the file "/etc/oratab" also looks promising. It was consistent until 11.2 (included). But since 12.1, especially with RAC (cluster installations), this is not that consistent any more. In all of my environments (RAC including) I have manually modified this file after major software upgrades though, so that it is consistent always, since my bash scripts rely on this oratab file.

Here we have single instance ASM instance name "+ASM" followed by GRID_HOME location (last letter N is irrelevant).

# cat /etc/oratab
+ASM:/oracle/11.2/grid:N
orcl:/oracle/11.2/db:N

In my cluster environments (two node cluster), it will look something like this:

# cat /etc/oratab
+ASM1:/oracle/19c/grid:N
cdb:/oracle/19c/db:N
...

Note the number 1 after ASM instance name. This 1 reflects the first instance of that cluster This can be verified also from running processes:

# ps -ef | grep asm_pmon
oracle    5435     1  0 Sep04 ?        00:04:17 asm_pmon_+ASM1
root      7184  2535  0 20:19 pts/12   00:00:00 grep --color=auto asm_pmon

For example on my second cluster instance, the number would be 2, so "+ASM2"

# ps -ef | grep asm_pmon
oracle    4467     1  0 Sep04 ?        00:04:04 asm_pmon_+ASM2
root     30942 30890  0 20:22 pts/3    00:00:00 grep --color=auto asm_pmon

Another idea. At the moment we are running this utility like this in our crontab scheduler: 49 22 * * * PATH=$PATH:/usr/sbin;/usr/bin/fusioninventory-agent &> /dev/null

Couldn't you make something like this, that I will give ORACLE_HOME and ORACLE_SID variables for this command:

49 22 * * * PATH=$PATH:/usr/sbin;ORACLE_HOME=/oracle/11.2/grid; ORACLE_SID=+ASM;/usr/bin/fusioninventory-agent &> /dev/null

So in ASM.pm file, you make work on these 2 variables and export these after "su -c''" command So for example:

command => "su - oracle -c 'export ORACLE_HOME=<get value set from within root env>; export ORACLE_SID=<get value set from within root env>; asmcmd lsdg'",

So something to think about.

Raul

raulk89 commented 4 years ago

From which version of the fusioninventory the ASM support has been added..?

Raul

g-bougard commented 4 years ago

On agent side, it was introduced in 2.4.

g-bougard commented 4 years ago

After reading you last comment, I just pushed ff07b4a to lookup for a user having GRID_HOME in his environment. Can you tell me if this does the job for your cases ?

g-bougard commented 4 years ago

Hi @raulk89 can you test updated #832 PR ?

raulk89 commented 4 years ago

Hi

Which files should I copy..?

From here I have taken:

There are these files also, but I do not have resources nor the t directories in location: "/usr/share/fusioninventory/"

So I did copy this one file (ASM.pm) only at the moment, and I am getting:

[debug] Running FusionInventory::Agent::Task::Inventory::Generic::Drives::ASM Odd number of elements in hash assignment at /usr/share/fusioninventory/lib/FusionInventory/Agent/Tools.pm line 351. [debug] unexpected error in FusionInventory::Agent::Task::Inventory::Generic::Drives::ASM: neither command, file or string parameter given at /usr/share/fusioninventory/lib/FusionInventory/Agent/Tools.pm line 344.

(with the previous ASM.pm, that I tried last month, it works)

Also, while checking the code inside ASM.pm, I have noticed this. The instance name "ORACLE_SID=+ASM" works only for singe instance. For clustered environment, there always is some number at the end. For example, if I have 3 node cluster (meaning there are 3 hosts in that cluster). Then asm instance_name's are:

+ASM1 - for the first host +ASM2 - for the second host +ASM3 - for the third host

my $cmd = ($grid_home ? "ORACLE_HOME='$grid_home' ORACLE_SID=+ASM " : "")."asmcmd lsdg";

Regards Raul

g-bougard commented 4 years ago

Hi @raulk89 have you an idea on how we can know which number to use for ORACLE_SID or where to get the right ORACLE_SID ? Or I can check for it in current env and uses +ASM as default if not found. I'm checking the error but can you confirm your agent version ? It seems to me you're not using latest agent.

g-bougard commented 4 years ago

@raulk89 I figure out the issue with your error, my fault. I'm also trying to read asm instance name looking for ORACLE_SID in grid, oracle and then current (root) user. I hope this is the right way to do. Tell me if something is wrong. From the PR, you only need to use ASM.pm, other files are for unittests. Can you give a try to my update ?

raulk89 commented 4 years ago

Version looks like 2.5.2

# rpm -qa | grep fusion

fusioninventory-agent-task-inventory-2.5.2-1.el7.x86_64 fusioninventory-agent-2.5.2-1.el7.x86_64

For ASM instance name. In my environments, I have that information inside oratab file. There are both, ASM instance name and grid home separated with colon (:).

# cat /etc/oratab

+ASM1:/oracle/19c/grid:N cdb:/oracle/19c/db:N ...

But as I said previously, in newer versions this not being consistent:

Before 12.1, the file "/etc/oratab" also looks promising. It was consistent until 11.2 (included). But since 12.1, especially with RAC (cluster installations), this is not that consistent any more. In all of my environments (RAC including) I have manually modified this file after major software upgrades though, so that it is consistent always, since my bash scripts rely on this oratab file.

Another thing would be to just grep the "asm_pmon" process itself. This will output the exact instance name: # ps -ef | grep asm_pmon

oracle 24769 22848 0 09:38 pts/0 00:00:00 grep --color=auto asm_pmon oracle 27050 1 0 Sep14 ? 00:02:39 asmpmon+ASM1

Here you can see "+ASM1" (number 1 meaning clustered environment node1)

Also, while grepping the ASM pmon process, you can clearly see the user also (first column in grep output). In my case this is "oracle". So perhaps it could come in handy, while deciding from which user (su - $user -c "") the command be executed. So no need to try with multiple users (grid, oracle, root, .. etc). Another good thing with grepping the running ASM pmon process - if ASM is not running, no need to even bother checking "su - $user -c "asmcmd lsdg"", since it will always output error. This command (asmcmd lsdg) only works when ASM is running - in other words, if asm_pmon process is present.

Couldn't you do this, with the following order:

  1. Check for running asm_pmon process (you can grep this with root user also), if not present then there's no point even go further.
  2. If ASM running, then there you can get the user that runs this ASM process
  3. Parse the asm instance_name from running process
  4. Then use that user and its env $GRID_HOME variable if $GRID_HOME is initialized under $user's env:

    su - $user -c "ORACLE_HOME=$GRID_HOME; ORACLE_SID=asm_instance_name_from_parsing; asmcmd lsdg"

  5. If $GRID_HOME was not initialized under $user's env, then check for oratab file for grid home (asm instance_name is also there) and if entry present, then use:

    su - oracle -c 'ORACLE_HOME=value_from_oratab_file; ORACLE_SID=value_from_oratab_file; asmcmd lsdg'

  6. If /etc/oratab file has no entry for ASM, then I don't know, just try:

    su - $user -c "asmcmd lsdg" or from root asmcmd lsdg .. etc..

You can move step 5 further ahead also (meaning 1, 2, 5, 3, 4, 6)

Regards Raul

raulk89 commented 4 years ago

Now I get this:

[debug] Running FusionInventory::Agent::Task::Inventory::Generic::Drives::ASM Use of uninitialized value $user in string eq at /usr/share/fusioninventory/lib/FusionInventory/Agent/Task/Inventory/Generic/Drives/ASM.pm line 40. Use of uninitialized value $user in concatenation (.) or string at /usr/share/fusioninventory/lib/FusionInventory/Agent/Task/Inventory/Generic/Drives/ASM.pm line 40.

g-bougard commented 4 years ago

Okay, thank you for the detailed analysis. I tried to follow your advice in last #832 commit. Can you review and try the related ASM.pm file ? Thank you

raulk89 commented 4 years ago

Ok, thanks.

I will test it and let you know.

By the way:

[root@dc2-muisoradb ~]# PATH=$PATH:/usr/sbin;/usr/bin/fusioninventory-agent

[info] target server0: server https://glpi.domain.com/plugins/fusioninventory/ [info] sending prolog request to server0 [info] running task Inventory [info] New inventory from test-muisoradb.domain.com-2019-01-23-22-49-02 for server0

By the way, where does this information come from..?

test-muisoradb.domain.com-2019-01-23-22-49-02

I must have cloned this host from this other host, but I have changed the old hostname from these files:

But looks like it does get this old hostname from somewhere else..

Raul

g-bougard commented 4 years ago

If you cloned the host, you just have to remove the FusionInventory-Agent.dump file where the deviceid is saved at the first agent run. The file is probably in /var/lib/fusioninventory-agent if you have a standard installation. After you have deleted that file, the agent will generate a new deviceid based on the current hostname and time.

raulk89 commented 4 years ago

Thanks, that indeed was the culprit. This solved the issue. # rm -f /var/lib/fusioninventory-agent/FusionInventory-Agent.dump

Raul

raulk89 commented 4 years ago

Hi, planning on testing your latest updates. My question is, is it possible for me to output certain variables inside ASM.pm file to console..? For example variable $asm, I have tried the following, but this does not work: echo "$asm"

It would help me a lot to test it properly.

Regards Raul

g-bougard commented 4 years ago

Hi @raulk89 yes, it is, just use the following syntax:

print STDERR "$asm\n";
raulk89 commented 4 years ago

Ok, thanks. Long story short, I have tested it for all my environments (11g to 19c and single instance or cluster). And it indeed worked fine for all of these.

Things I noticed:

Just noting, something I forgot to mention earlier. While checking /etc/oratab file, there may also be commented lines for +ASMn:.., for example #+ASM1:/oracle/19c/.....:N

# grep -w "+ASM1" /etc/oratab

+ASM1:/oracle/19c/grid:N

+ASM1:/oracle/19c/grid:N

So should be using symbol "^" (to check from the beggining of line): # grep -w "^+ASM1" /etc/oratab

+ASM1:/oracle/19c/grid:N

But, actually, when testing this, it seems like your code does consider this already. You can confirm this perhaps. At least when having commented line for +ASM inside oratab file, it indeed worked as it should.

At the moment, I have figured out that it works as follows:

  1. Check for running asm_pmon process, if present, we continue, otherwise we will not proceed
  2. Get user and asm instance_name from running processes
  3. Check for $GRID_HOME env for asm process user
  4. If grid_home not initialized, then grid_home is found from inside /etc/oratab
  5. Whether grid_home initialized or not, it depends whether to execute
    • su - $user -c "ORACLE_HOME='$grid_home' ORACLE_SID=$asm asmcmd lsdg"
    • or su - $user -c "asmcmd lsdg"
    • "# asmcmd lsdg" (from root user) should never happen

Raul

g-bougard commented 4 years ago

Hi @raulk89

thank you for your detailed feedback.

Regarding, the user not being root as asm_pmon should not be run as root, this is finally not a big deal. As you said, we can leave it, and I think we should as who knows ? Maybe Oracle will decide to run asm_pmon as root in some context in the future. If you can tell us how to know we can run the command as root, maybe I can adjust the code and not reduce the unless on l.45 to $user == "root". But don't take time on that if the code is working as is.

About the semi-colon question, indeed you're making a mistake. I'm using a shell syntax: when a environment variable is set before a command, it will only be available for this command. In a shell script, the variable won't be available for next commands. Here as we are always running one command this is nice enough. Using semi-colon will require to use the "export" shell directive and I'm not absolutely sure this syntax is portable. So finally, I can tell you my code is fine and avoiding semi-colon is a choice.

About the ^ symbol, it is indeed used. Check lines 37 & 38, it is included there. I had to create $asm_for_re to prepare the regexp as the + symbol has a meaning for a regexp. On line 37, I'm extracting all matching lines but I only expect one and I'm using the first one. On line 38, I extract grid_home if I found a line.

raulk89 commented 4 years ago

Ok.

I have noticed a another problem though. Previously it worked because, I had GRID_HOME initialized in my root user .bash_profile file and also I had put this to PATH cat /root/.bash_profile

export GRID_HOME=grep "GRID_HOME=" /home/oracle/.bash_profile | cut -d "=" -f2 export PATH=$PATH:$HOME/bin:$GRID_HOME/bin

So in root user, I can do:

# which asmcmd
/oracle/11.2/grid/bin/asmcmd

But this is rather uncommon to set this variable and PATH for the root user. For my environments, I have done it. But this may not be case for other people.

So when I remove this variable and PATH from root user's .bash_profile file, then I get

# which asmcmd

/usr/bin/which: no asmcmd in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/bin

And code from ASM.pm file does not proceed any further from this:

sub isEnabled {
    return canRun('asmcmd');
}

When isEnabled is changed from: return canRun('asmcmd'); To return canRun('/oracle/11.2/grid/bin/asmcmd');

Then it will proceed from there. But at the moment $grid_home is found few lines below that: my $grid_home = getFirstLine(command => "su - $user -c 'echo \$GRID_HOME'");

Another thing regarding this, when executing asmcmd with $user (in my case oracle), then you also must use $grid_home/bin/asmcmd

So what I'll suggest you to do is: my $cmd = ($grid_home ? "ORACLE_HOME='$grid_home' ORACLE_SID=$asm " : "")."asmcmd lsdg"; to my $cmd = ($grid_home ? "ORACLE_SID=$asm " : "")."$grid_home/bin/asmcmd lsdg";

_Actually no point doing ORACLE_HOME=$grid_home. It is vital to give exact path to asmcmd binary though and asminstance name also.

But problem with this is:

Perhaps need to rewrite this condition somewhat differently. I do not know, what do you think about this..? my $cmd = ($grid_home ? "ORACLE_SID=$asm $grid_home/bin/" : "")."asmcmd lsdg";

At least when I tested this last condition, it did work as follows:

Regards Raul

raulk89 commented 4 years ago

Hi

Have you managed to look into the last comment I left.

Regards Raul

g-bougard commented 4 years ago

Hi @raulk89 I saw your comment and indeed I think the module needs some more work. Actually, even enabling the module should be rework as we should better look for asm_pmon process. I don't have time right now to do that as I'm on vacation. Will do early in november.

raulk89 commented 4 years ago

Ok, understood. Have a nice vacation..:)

Raul

g-bougard commented 4 years ago

Hi @raulk89 I just pushed 3 updates on #382 , can you check if they help ?

  1. 1fdc361 enables the module if we detect the process is running
  2. e083686 follows your suggestion to use the $grid_home/bin/asmcmd as full path ti the command.
  3. 9a8bd61 just avoids to launch su for root user when looking for GRID_HOME so we directly take it from the current environment. So you can also preset it when run the agent from a crontab.
raulk89 commented 4 years ago

Hi

Thanks. I have diskgroup named: OCRVOTE, so I just grep this name from the xml file. I added some print commands also.

[root@etmasterdb1 ~]# PATH=$PATH:/usr/sbin;/usr/bin/fusioninventory-agent -l inventory.xml; grep "OCRVOTE" inventory.xml
...........
user: oracle
grid_home1: /oracle/19c/grid
grid_home2: /oracle/19c/grid
cmd1: ORACLE_SID=+ASM1 /oracle/19c/grid/bin/asmcmd lsdg
cmd2: su - oracle -c "ORACLE_SID=+ASM1 /oracle/19c/grid/bin/asmcmd lsdg"
      <LABEL>OCRVOTE</LABEL>

Notice the last row (LABEL), it means that this diskgroup is present there. All good for user oracle.

One more thing I noticed though in one of my 11.2 single instance database. It is much safer if we do this instead. So please do the following change: my $cmd = ($grid_home ? "ORACLE_SID=$asm $grid_home/bin/" : "")."asmcmd lsdg"; --> my $cmd = ($grid_home ? "ORACLE_SID=$asm ORACLE_HOME=$grid_home $grid_home/bin/" : "")."asmcmd lsdg";

Since when ORACLE_HOME is set to DB_HOME value in user oracle environment, then it does not work. So it is much safer to change this to grid_home value also.

I tried to test this with root user as well. So I did this.

    #my $user  = $asm_pmon->{USER};
    my $user  = 'root';

And I do not see this LABEL there:

[root@etmasterdb1 ~]# PATH=$PATH:/usr/sbin;/usr/bin/fusioninventory-agent -l inventory.xml; grep "OCRVOTE" inventory.xml
.......
user: root
grid_home1: /oracle/19c/grid
grid_home2: /oracle/19c/grid
cmd1: ORACLE_SID=+ASM1 /oracle/19c/grid/bin/asmcmd lsdg
cmd2: ORACLE_SID=+ASM1 /oracle/19c/grid/bin/asmcmd lsdg

So command itself is fine, since it indeed work when run as root user:

[root@etmasterdb1 ~]# ORACLE_SID=+ASM1 /oracle/19c/grid/bin/asmcmd lsdg
State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512             512   4096  1048576   2047997  1066651                0         1066651              0             N  DATA/
MOUNTED  EXTERN  N         512             512   4096  4194304     52220    28064                0           28064              0             N  MGMT/
MOUNTED  EXTERN  N         512             512   4096  4194304     10236     9864                0            9864              0             Y  OCRVOTE/

This is indeed strange.

Problem seems to come after this print message. Any idea, how should I debug this part..? At the moment I do not get any errors also. Just this diskgroup not present in xml file.

    $cmd = "su - $user -c \"$cmd\"" unless $user eq "root";
print STDERR "cmd2: $cmd\n";
    my $diskgroups = _getDisksGroups(
        command => $cmd,
        logger  => $logger
    );

    return unless $diskgroups;

Another thing when user is root: my $grid_home = $user eq 'root' ? $ENV{GRID_HOME} : getFirstLine(command => "su - $user -c 'echo \$GRID_HOME'");

And I do (unset GRID_HOME):

[root@etmasterdb1 ~]# unset GRID_HOME
[root@etmasterdb1 ~]# PATH=$PATH:/usr/sbin;/usr/bin/fusioninventory-agent -l inventory.xml; grep "OCRVOTE" inventory.xml
.....
user: root
Use of uninitialized value $grid_home in concatenation (.) or string at /usr/share/fusioninventory/lib/FusionInventory/Agent/Task/Inventory/Generic/Drives/ASM.pm line 35.
.....

Notice this: Use of uninitialized value $grid_home in concatenation (.) ....

Is it possible to not get this error whenever $GRID_HOME not set in root user environement..?

Thanks.

Regards Raul

g-bougard commented 4 years ago

Hi @raulk89

thank you for this detailed comment. I'm agree with your first change request, as you think it's safer to set ORACLE_HOME.

For your other problems, I'm not sure to understand everything. The only thing I really understand is why you have that error message and I think it may have let you think there a problem with the code but in fact I don't see one. The problem is in fact your added lines... I guess you added something like:

print STDERR "grid_home1: $grid_home\n";

When you unset GRID_HOME in your environment, $grid_home becomes undef and it should not be used in any string. This is why I test it before using it. You should replace your code with that:

print STDERR "grid_home1: ".($grid_home//"undef")."\n";

In the root case and when GRID_HOME is not set, do you think we should at least set ORACLE_SID=$asm ?

g-bougard commented 4 years ago

If you think yes, for my last question, you may try the following line for the first `$cmd\ set:

my $cmd = "ORACLE_SID=$asm ".($grid_home ? "ORACLE_HOME=$grid_home $grid_home/bin/" : "")."asmcmd lsdg";

This is correct as $asm should always be set there.

raulk89 commented 4 years ago

Hi

Regarding: Use of uninitialized value $grid_home in concatenation (.) You are correct here. Problem here is my print message.

If you think yes, for my last question, y I'll answer no here. So this change is not necessary. As it will produce the following command: ORACLE_SID=+ASM1 asmcmd lsdg

It is good enough to just leave this as it is: asmcmd lsdg

So yeah, please add this ORACLE_HOME=$grid_home

But do you have any idea how to debug this problem that I have with the root user..? Same command works for me when I execute this under root user.

Raul

g-bougard commented 4 years ago

Hi @raulk89 okay to only add ORACLE_HOME setting in the case we found another user than root.

About your last question, indeed this is what I didn't understand in your explanation. Can you clarify what is the problem with root user ? I tried to read again your previous comment but I'm lost. One point anyway is you seem to obtain a value for $grid_home as we should find it in /etc/oratab (feature handled by lines 33-39). Probably the point in your comment is around your words "Just this diskgroup not present in xml file", but here I didn't understand what is the context of that case.

g-bougard commented 3 years ago

Hi @raulk89 have a last comment on this issue or a change request on the PR before I merge it ?

raulk89 commented 3 years ago

Hi

The problem that I last described was this. You can see that with root user, this command indeed works:

[root@etmasterdb1 ~]# ORACLE_SID=+ASM1 ORACLE_HOME=/oracle/19c/grid /oracle/19c/grid/bin/asmcmd lsdg
State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512             512   4096  1048576   2047997  1076344                0         1076344              0             N  DATA/
MOUNTED  EXTERN  N         512             512   4096  4194304     52220    28064                0           28064              0             N  MGMT/
MOUNTED  EXTERN  N         512             512   4096  4194304     10236     9864                0            9864              0             Y  OCRVOTE/
[root@etmasterdb1 ~]#

But by running this agent, it does not work for some reason.

[root@etmasterdb1 ~]# fusioninventory-agent
[info] target server0: server https://..............
[info] sending prolog request to server0
[info] running task Inventory
[info] New inventory from hostname.domain.com-2020-10-17-23-05-01 for server0
user: root
grid_home1: /oracle/19c/grid
grid_home2: /oracle/19c/grid
cmd1: ORACLE_SID=+ASM1 ORACLE_HOME=/oracle/19c/grid /oracle/19c/grid/bin/asmcmd lsdg
cmd2: ORACLE_SID=+ASM1 ORACLE_HOME=/oracle/19c/grid /oracle/19c/grid/bin/asmcmd lsdg
handle: GLOB(0x34c4ad0)
diskgroups: ARRAY(0x28a0dc8)

I did add several print messages to see what happens, and it does not make it here:


    while (my $line = <$handle>) {
print STDERR "line: $line\n";

This how it looks like at the moment: image

Regards Raul

g-bougard commented 3 years ago

Thank you Raul, now the problem is clear to me: if $cmd starts with the environment variable setting, the command doesn't work.

I updated the PR with a fix, setting %ENV as local in the root case. Can you test the resulting ASM.pm ? See f8331e0f2

raulk89 commented 3 years ago

Ok, thanks.

From where did you figure out that this was the problem for the root user..? :D

cmd2: ORACLE_SID=+ASM1 ORACLE_HOME=/oracle/19c/grid /oracle/19c/grid/bin/asmcmd lsdg
vs
cmd2: /oracle/19c/grid/bin/asmcmd lsdg

I wouldn't thought of that, since command worked fine while running separately: # ORACLE_SID=+ASM1 ORACLE_HOME=/oracle/19c/grid /oracle/19c/grid/bin/asmcmd lsdg

But you were 100% correct here. But I don't understand why it was a problem here, can you explain perhaps..?

By the way, seems like this doesn't matter here. I've tested multiple times, and when I removed these lines, it still worked. But if you think, these won't oppose any problem, then you may leave these..

    local $ENV{ORACLE_SID}  = $asm       if $root;
    local $ENV{ORACLE_HOME} = $grid_home if $root && $grid_home;

Raul

g-bougard commented 3 years ago

Indeed, our getFileHandle() doesn't work when command parameter is something like ORACLE_SID=+ASM1 asmcld lsdg as for the perl open, the first parameter will be used as the command. This works with user other than root as here we are inserting the command into a "su" command. So for root, we just have to set environment in the local %ENV.

If you're saying this also works without the 2 $ENV{} lines, this means /oracle/19c/grid/bin/asmcmd lsdg should just run without environment set on the computer you tested or ORACLE_SID and/or ORACLE_HOME are still set in your environment.

Anyway, now I know I can merge the PR. I'll do next week before the release.

raulk89 commented 3 years ago

this means /oracle/19c/grid/bin/asmcmd lsdg should just run without environment set on the computer you tested

Indeed.

Since I did unset these variables before. What I did:

vi /usr/share/fusioninventory/lib/FusionInventory/Agent/Task/Inventory/Generic/Drives/ASM.pm

    #local $ENV{ORACLE_SID}  = $asm       if $root;
    #local $ENV{ORACLE_HOME} = $grid_home if $root && $grid_home;

And then executing:

[root@etmasterdb1 ~]# unset GRID_HOME
[root@etmasterdb1 ~]# unset ORACLE_HOME 
[root@etmasterdb1 ~]# unset ORACLE_SID
[root@etmasterdb1 ~]# echo "$GRID_HOME $ORACLE_HOME $ORACLE_SID"

[root@etmasterdb1 ~]# fusioninventory-agent
[info] target server0: server https://...............
[info] sending prolog request to server0
[info] running task Inventory
[info] New inventory from ............-2020-10-17-23-05-01 for server0
user: root
grid_home1: /oracle/19c/grid
grid_home2: /oracle/19c/grid
cmd1: /oracle/19c/grid/bin/asmcmd lsdg
cmd2: /oracle/19c/grid/bin/asmcmd lsdg
line: State    Type    Rebal  Sector  Logical_Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
line: MOUNTED  EXTERN  N         512             512   4096  1048576   2047997  1076009                0         1076009              0             N  DATA/
line: MOUNTED  EXTERN  N         512             512   4096  4194304     52220    28064                0           28064              0             N  MGMT/
line: MOUNTED  EXTERN  N         512             512   4096  4194304     10236     9864                0            9864              0             Y  OCRVOTE/

Anyway, now I know I can merge the PR. I'll do next week before the release.

For what version we can expect this..? 2.5.2 is current, correct..?

Raul

g-bougard commented 3 years ago

Next release will be 2.6. I'm trying to release it next week.