I've just realized that the net-99-report-ready script does not contains a timeout value in its wget and/or curl command, making the script hang for quite some time if there's some network connectivity issue with the Onegate Endpoint.
Currently the script will use the default timeout of wget, which is 900 seconds (15 minutes), or 2 minutes for curl (depending on the operating system).
We should probably implement a timeout in the wget and curl command of the script net-99-report-ready :
It also looks like the ruby scripts that are reporting the ready state are stuck :
root@server:/home/thomas# service one-context status
● one-context.service - OpenNebula contextualization script
Loaded: loaded (/lib/systemd/system/one-context.service; enabled; vendor preset: enabled)
Active: activating (start) since Wed 2021-07-14 10:18:47 UTC; 45s ago
Main PID: 1398 (bash)
Tasks: 6 (limit: 9448)
Memory: 43.9M
CGroup: /system.slice/one-context.service
├─1398 bash /usr/sbin/one-contextd network
├─1515 bash /usr/sbin/one-contextd network
├─1516 bash /etc/one-context.d/net-99-report-ready # waiting for the child process below
├─1594 bash /usr/bin/onegate vm update --data READY=YES # waiting for the child process below
└─1597 ruby /usr/bin/onegate.rb vm update --data READY=YES # <------- stuck
I have updated the net-99-report-ready script to implement timeouts on the curl and wget commands, and that's preventing the net-99-report-ready script to remain stuck on the wget or curl step, but it remains stuck on the ruby step.
Parameter for curl : --max-time 5
Parameter for wget : --timeout=5
Here is the overview of using the timeout parameter with curl and/or wget (as you can see it's giving up quite quickly) :
Jul 14 09:52:33 server one-contextd[1513]: Script net-99-report-ready: Starting ...
Jul 14 09:53:42 server one-contextd[1643]: Script net-99-report-ready output: % Total % Received % Xferd Average Speed Time Time Time Current
Jul 14 09:53:42 server one-contextd[1643]: Dload Upload Total Spent Left Speed
Jul 14 09:53:42 server one-contextd[1643]: #015 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0#015 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0#015 0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0#015 0 0 0 0 0 0 0 0 --:--:-- 0:00:03 --:--:-- 0#015 0 0 0 0 0 0 0 0 --:--:-- 0:00:04 --:--:-- 0#015 0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0
Jul 14 09:53:42 server one-contextd[1643]: curl: (28) Connection timed out after 5002 milliseconds
Jul 14 09:53:42 server one-contextd[1643]: --2021-07-14 09:52:38-- http://172.16.100.1:5030/vm
Jul 14 09:53:42 server one-contextd[1643]: Connecting to 172.16.100.1:5030... failed: Connection timed out.
Jul 14 09:53:42 server one-contextd[1643]: Giving up.
Jul 14 09:53:42 server one-contextd[1643]: ERROR:
Jul 14 09:53:42 server one-contextd[1643]: Error timeout while connected to server (execution expired).
Jul 14 09:53:42 server one-contextd[1643]: Server: 172.16.100.1:5030
Jul 14 09:53:42 server one-contextd[1644]: Script net-99-report-ready: Finished with exit code 0
Also, I believe that the current script is always going to return an exit code of 0, so maybe we should update the logic to show exit codes different than 0 in case of errors ?
Hi,
I've just realized that the
net-99-report-ready
script does not contains atimeout
value in itswget
and/orcurl
command, making the script hang for quite some time if there's some network connectivity issue with the Onegate Endpoint.Currently the script will use the default timeout of
wget
, which is 900 seconds (15 minutes), or 2 minutes forcurl
(depending on the operating system).We should probably implement a timeout in the
wget
andcurl
command of the scriptnet-99-report-ready
:It also looks like the ruby scripts that are reporting the ready state are stuck :
I have updated the
net-99-report-ready
script to implement timeouts on thecurl
andwget
commands, and that's preventing thenet-99-report-ready
script to remain stuck on thewget
orcurl
step, but it remains stuck on the ruby step.Parameter for
curl
:--max-time 5
Parameter forwget
:--timeout=5
Here is the overview of using the timeout parameter with
curl
and/orwget
(as you can see it's giving up quite quickly) :Also, I believe that the current script is always going to return an exit code of 0, so maybe we should update the logic to show exit codes different than 0 in case of errors ?
Thomas