OpenNebula / addon-xen

Xen hypervisor add-on
3 stars 13 forks source link

Error on montoring host with XEN - addon-xen #8

Open Jeparre opened 8 years ago

Jeparre commented 8 years ago

I'm trying to configure a new node host with xen and having some trouble.. This log appears in oned.log. Someone else already had this problem?

Error parsing host information: syntax error, unexpected VARIABLE, expecting EQUAL or EQUAL_EMPTY at line 1, columns 7:16. Monitoring information:

AngelaMoss commented 8 years ago

Same issue here.

oned log :

[Z0][InM][D]: Monitoring host 192.168.1.20 (10) [Z0][InM][I]: Command execution fail: scp -r /var/lib/one/remotes/. 192.168.1.20:/var/tmp/one [Z0][InM][I]: /var/lib/one/remotes/./im/xen.d/collectd-client.rb: No such file or directory [Z0][InM][I]: /var/lib/one/remotes/./im/xen.d/collectd-client_control.sh: No such file or directory [Z0][InM][I]: /var/lib/one/remotes/./xen/prereconfigure: No such file or directory [Z0][InM][I]: /var/lib/one/remotes/./xen/reconfigure: No such file or directory [Z0][InM][I]: ExitCode: 1 [Z0][ONE][E]: Error monitoring Host 192.168.1.20 (10):

Seems to be a problem of files missing. Actually files do exist but there are symbolic links (collectd-client.rb, collectd-client_control.sh, prereconfigure, reconfigure). Concerning the scp command execution failure, it does work manually. Don't understand.

lgrawet commented 7 years ago

This pull request should solve all your problems: https://github.com/OpenNebula/addon-xen/pull/9 along with this OpenNebula fix: https://github.com/OpenNebula/one/pull/139

JOJ0 commented 7 years ago

Hi Laurent, I just installed everything from addon-xen master and also made sure that I do not have leftovers of the old broken and partly self-fixed addon. I am still getting this error:

Wed Nov  9 10:18:23 2016 [Z0][ONE][E]: Error parsing host information: syntax error, unexpected VARIABLE, expecting EQUAL or EQUAL_EMPTY at line 1, columns 7:16. Monitoring information: 
Error executing sudo /usr/sbin/xentop -fbi2

I also installed the mentionend one fix #139 (just quickly manually changed the line in vnm_mad/remotes/lib/vnmmad.rb) I am using One 5.0.2 on Ubuntu 16.04

which commit(s) of your PR is actually dealing with this error? That would help me to further debug this issue.

Thank you very much!! all the best from Vienna Jojo

JOJ0 commented 7 years ago

I just realized that not every monitoring request fails, only about every second time !?!

lgrawet commented 7 years ago

Hi JOJ0,

I think I had this problem when I first tried the addon. Then I started to write the fixes and the problem went away. I find no difference between my xen drivers and the current state of the repository.

Try to run this as oneadmin on your problematic host and look at the output: sudo /usr/sbin/xentop -fbi2 /var/tmp/one/vmm/xen/poll -t /var/tmp/one/vmm/xen/poll

Also make sure you have this line in /etc/sudoers oneadmin ALL=(ALL) NOPASSWD: /usr/sbin/xentop *

Best regards,

Laurent

JOJ0 commented 7 years ago

Hi Laurent, thanks a lot for the hint. I partly did check this already. xentop does work. the error is somewhere in the poll script.

oneadmin@dell2:~$ sudo /usr/sbin/xentop -fbi2
      NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR  VBD_RSECT  VBD_WSECT SSID
  Domain-0 -----r         40    0.0   10229200   97.6   no limit       n/a     4    0        0        0    0        0        0        0          0          0    0
      NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR  VBD_RSECT  VBD_WSECT SSID
  Domain-0 -----r         40    2.1   10229200   97.6   no limit       n/a     4    0        0        0    0        0        0        0          0          0    0
oneadmin@dell2:~$ 

oneadmin@dell2:~$ /var/tmp/one/vmm/xen/poll -t
Error executing sudo /usr/sbin/xentop -fbi2

oneadmin@dell2:~$ /var/tmp/one/vmm/xen/poll
Error executing sudo /usr/sbin/xentop -fbi2

What OS and Xen version are you using on your Hypervisor? Maybe the problem is a tiny difference in the output of xentop? and that's why the parsing goes wrong?

lgrawet commented 7 years ago

This is Ubuntu 14.04/Xen 4.4.

$ sudo /usr/sbin/xentop -fbi2
      NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR  VBD_RSECT  VBD_WSECT SSID
  Domain-0 -----r        238    0.0    8388608    3.1    8388608       3.1    32    0        0        0    0        0        0        0          0          0    0
      NAME  STATE   CPU(sec) CPU(%)     MEM(k) MEM(%)  MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS   VBD_OO   VBD_RD   VBD_WR  VBD_RSECT  VBD_WSECT SSID
  Domain-0 -----r        238    1.3    8388608    3.1    8388608       3.1    32    0        0        0    0        0        0        0          0          0    0
Jeparre commented 7 years ago

Hey guys, I think that I found a workaround to this error. I don't found the error on the code, but when I comment the line: #load_vars(hypervisor, file, vars) inside the file: "/var/tmp/one/vmm/xen/poll" my host can be monitored now. I trying to know why, but maybe this can help somebody.

JOJ0 commented 7 years ago

@Jeparre, I think you are kind of deactivating monitoring with this completely, but I am not sure.

the variable vars holds XM_POLL which holds "sudo /usr/sbin/xentop -bi2", which is the actual "monitoring command" that is executed on the hypervisor.

update: on the other hand....the exact same xentop line is stated in xenrc file which overrides XM_POLL anyway...

IMHO the error must be somewhere in "def self.get_all_vm_info", which is a complex text parsing thing that extracts info from the xentop output (variable text in line 78)

if get_all_vm_info fails it throws the rescue error we are seeing in the log (line 126):

          rescue
            STDERR.puts "Error executing #{CONF['XM_POLL']}"
            nil
          end

well, that's just my uneducated guesses and analysis, could be all wrong, but maybe it helps you deconstruct the thing further.

Jojo

JOJ0 commented 7 years ago

@Igrawet, I never got back to you. i am also using Ubuntu 14.04/Xen 4.4. so my theory that there is a difference in the xentop output and that's why parsing fails, is very very unlikely ;-)

Jeparre commented 7 years ago

Hummm.. so I will try to find that error at the code. An off topic question: are you using lvm with opennebula?

Regards, Jeparre

On Nov 30, 2016 7:10 AM, "J0J0 T" notifications@github.com wrote:

Igrawet, I never got back to you. i am also using Ubuntu 14.04/Xen 4.4. so my theory that there is a difference in the xentop output and that's why parsing file, is very very unlikely.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/OpenNebula/addon-xen/issues/8#issuecomment-263821033, or mute the thread https://github.com/notifications/unsubscribe-auth/ALMY0Np_nyS07L-EcPWypkhVr4sic8Tqks5rDT2JgaJpZM4JVJON .

JOJ0 commented 7 years ago

@jeparre no I don't but I use drbd with opennebula, which also accesses just a block device. so there is some similarity. but let's better discuss this on the forum. i suggest you just post your lvm question there

abealasd commented 7 years ago

@JOJ0 I think the problem is caused by missing attributes in dom['config']. In my case I added "if !dom['config'].has_key?('disks') then next" at line 196 to temporary avoid this error message.