eclipse-fog05 / fog05

End-to-End Compute, Storage and Networking Virtualisation.
http://fog05.io/
Other
55 stars 24 forks source link

[BUG] Node not listed/connected #159

Closed tinchouuu closed 4 years ago

tinchouuu commented 4 years ago

A node with a successfully running agent on it is not listed/connected even if the communication via Yaks is successful.

Steps to reproduce the behavior:

  1. install fog05 via ./fos_install.sh (manual installation causes runtime errors in libzenohc.so)
  2. run yaksd -vv
  3. run sudo -u fos fagent -c /etc/fos/agent.json -v
  4. run sudo -u fos fos_linux /etc/fos/plugins/linux/linux_plugin.json
  5. run sudo -u fos /etc/fos/plugins/linuxbridge/linuxbridge_plugin /etc/fos/plugins/linuxbridge/linuxbridge_plugin.json
  6. run sudo -u fos /etc/fos/plugins/LXD/LXD_plugin /etc/fos/plugins/LXD/LXD_plugin.json (with LXD_DIR=/var/snap/lxd/common/lxd/ exported due to snap installation) All plugins are up and running.

The expected behavior is to have the node with the running agent listed when trying to run a python (python3) script to list the nodes via the FIMAPI. Nevertheless, the returned list was empty.

Python script:

from fog05 import FIMAPI

api = FIMAPI(locator='127.0.0.1:7447')
ls = api.node.list()
print(*ls)
api.close()
exit(0)

yaksd's dumps:

[1572807974.282815][DEBUG] Reading WriteData
[1572807974.283580][DEBUG] Handling WriteData Message. nid[UNKNOWN] sid[3] res[/alfos/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/status] [6000027b2275756964223a202262353362393534302d316135662d343736372d396466642d366266396237396538343335222c202272616d223a207b2266726565223a20363931322e35383938343337352c2022746f74616c223a20373937382e373839303632357d2c20226469736b223a205b7b2266726565223a2032332e32313633363139393935313137322c2022746f74616c223a2033312e3932383339303530323932393638382c20226d6f756e745f706f696e74223a20222f227d2c207b2266726565223a20302e302c2022746f74616c223a20302e303837303336313332383132352c20226d6f756e745f706f696e74223a20222f736e61702f636f72652f37393137227d2c207b2266726565223a20302e302c2022746f74616c223a20302e3035333436363739363837352c20226d6f756e745f706f696e74223a20222f736e61702f6c78642f3132323234227d5d2c20226e65696768626f7273223a205b5d7d]
[1572807974.283720][DEBUG] Forwarding data to session -1
[1572807974.283774][DEBUG] Forwarding data to session 0
[1572807974.283871][DEBUG] Writing StreamData
[1572807974.286693][DEBUG] Reading WriteData
[1572807974.287201][DEBUG] Handling WriteData Message. nid[UNKNOWN] sid[0] res[/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/status] [40027b2275756964223a2262353362393534302d316135662d343736372d396466642d366266396237396538343335222c2272616d223a7b22746f74616c223a373937382e373839303632352c2266726565223a363931322e35383938343337357d2c226469736b223a5b7b226d6f756e745f706f696e74223a222f222c22746f74616c223a33312e3932383339303530323932393638382c2266726565223a32332e32313633363139393935313137327d2c7b226d6f756e745f706f696e74223a222f736e61702f636f72652f37393137222c22746f74616c223a302e303837303336313332383132352c2266726565223a302e307d2c7b226d6f756e745f706f696e74223a222f736e61702f6c78642f3132323234222c22746f74616c223a302e3035333436363739363837352c2266726565223a302e307d5d2c226e65696768626f7273223a5b5d7d]
[1572807974.287922][DEBUG] Forwarding data to session -1
[1572807979.287137][DEBUG] Reading WriteData
[1572807979.287835][DEBUG] Handling WriteData Message. nid[UNKNOWN] sid[3] res[/alfos/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/status] [6000027b2275756964223a202262353362393534302d316135662d343736372d396466642d366266396237396538343335222c202272616d223a207b2266726565223a20363931322e33343337352c2022746f74616c223a20373937382e373839303632357d2c20226469736b223a205b7b2266726565223a2032332e32313633363139393935313137322c2022746f74616c223a2033312e3932383339303530323932393638382c20226d6f756e745f706f696e74223a20222f227d2c207b2266726565223a20302e302c2022746f74616c223a20302e303837303336313332383132352c20226d6f756e745f706f696e74223a20222f736e61702f636f72652f37393137227d2c207b2266726565223a20302e302c2022746f74616c223a20302e3035333436363739363837352c20226d6f756e745f706f696e74223a20222f736e61702f6c78642f3132323234227d5d2c20226e65696768626f7273223a205b5d7d]
[1572807979.288051][DEBUG] Forwarding data to session -1
[1572807979.288104][DEBUG] Forwarding data to session 0
[1572807979.288132][DEBUG] Writing StreamData
[1572807979.291088][DEBUG] Reading WriteData
[1572807979.291583][DEBUG] Handling WriteData Message. nid[UNKNOWN] sid[0] res[/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/status] [40027b2275756964223a2262353362393534302d316135662d343736372d396466642d366266396237396538343335222c2272616d223a7b22746f74616c223a373937382e373839303632352c2266726565223a363931322e33343337357d2c226469736b223a5b7b226d6f756e745f706f696e74223a222f222c22746f74616c223a33312e3932383339303530323932393638382c2266726565223a32332e32313633363139393935313137327d2c7b226d6f756e745f706f696e74223a222f736e61702f636f72652f37393137222c22746f74616c223a302e303837303336313332383132352c2266726565223a302e307d2c7b226d6f756e745f706f696e74223a222f736e61702f6c78642f3132323234222c22746f74616c223a302e3035333436363739363837352c2266726565223a302e307d5d2c226e65696768626f7273223a5b5d7d]
[1572807979.292256][DEBUG] Forwarding data to session -1
...............

And at the moment of the linux plugin activation the following is dumped by the agent:

fagent: [DEBUG] Reading StreamData
fagent: [INFO] Subscriber listener raised exception Atdgen_runtime.Oj_run.Error("Line 1:\nMissing record field intf_configuration") : 
Raised at file "atdgen-runtime/src/oj_run.ml", line 22, characters 2-18
Called from file "fos-im/fos_types.ml", line 8565, characters 31-196
Called from file "lib/read.mll", line 587, characters 24-42
Called from file "lib/read.ml" (inlined), line 2041, characters 3-55
Called from file "lib/read.mll" (inlined), line 1071, characters 13-47
Called from file "atdgen-runtime/src/oj_run.ml", line 161, characters 2-38
Called from file "fos-im/fos_types.ml" (inlined), line 9617, characters 15-85
Called from file "fos-im/fos_types.ml", line 10207, characters 31-116
Called from file "fos-core/yaks_connector.ml", line 79, characters 23-59
Called from file "src/core/lwt.ml", line 2025, characters 16-20

fagent: [DEBUG] >>> Received message of 428 bytes
fagent: [DEBUG] tx-received: (r_pos: 0, w_pos: 428 content: 29:01:5d:2f:61:6c:66:6f:73:2f:62:35:33:62:39:35:34:30:2d:31:61:35:66:2d:34:37:36:37:2d:39:64:66:64:2d:36:62:66:39:62:37:39:65:38:34:33:35:2f:70:6c:75:67:69:6e:73:2f:34:39:63:36:35:32:64:31:2d:66:38:62:63:2d:34:32:32:30:2d:61:65:32:30:2d:35:33:35:32:36:65:34:66:65:66:30:39:2f:69:6e:66:6f:ca:02:70:80:d0:f0:f3:86:c5:c8:df:5d:b3:b9:68:15:26:01:49:0b:89:7e:6d:38:30:4d:f3:02:00:02:7b:22:6e:61:6d:65:22:3a:20:22:6c:69:6e:75:78:22:2c:20:22:76:65:72:73:69:6f:6e:22:3a:20:31:2c:20:22:75:75:69:64:22:3a:20:22:34:39:63:36:35:32:64:31:2d:66:38:62:63:2d:34:32:32:30:2d:61:65:32:30:2d:35:33:35:32:36:65:34:66:65:66:30:39:22:2c:20:22:74:79:70:65:22:3a:20:22:6f:73:22:2c:20:22:64:65:73:63:72:69:70:74:69:6f:6e:22:3a:20:22:6c:69:6e:75:78:20:6f:73:20:70:6c:75:67:69:6e:22:2c:20:22:63:6f:6e:66:69:67:75:72:61:74:69:6f:6e:22:3a:20:7b:22:79:6c:6f:63:61:74:6f:72:22:3a:20:22:74:63:70:2f:31:32:37:2e:30:2e:30:2e:31:3a:37:34:34:37:22:2c:20:22:6e:6f:64:65:69:64:22:3a:20:22:62:35:33:62:39:35:34:30:2d:31:61:35:66:2d:34:37:36:37:2d:39:64:66:64:2d:36:62:66:39:62:37:39:65:38:34:33:35:22:2c:20:22:65:78:70:6f:73:65:22:3a:20:74:72:75:65:2c:20:22:75:70:64:61:74:65:5f:69:6e:74:65:72:76:61:6c:22:3a:20:35:7d:2c:20:22:70:69:64:22:3a:20:32:36:37:31:2c:20:22:73:74:61:74:75:73:22:3a:20:22:72:75:6e:6e:69:6e:67:22:7d:) 
fagent: [DEBUG] Reading WriteData
fagent: [DEBUG] [FOS-AGENT] - CB-LA-PLUGIN - ##############
fagent: [DEBUG] [FOS-AGENT] - CB-LA-PLUGIN - Received plugin
fagent: [DEBUG] [FOS-AGENT] - CB-LA-PLUGIN - Name: linux
fagent: [DEBUG] [FOS-AGENT] - CB-LA-PLUGIN -  Plugin loaded advertising on GA
fagent: [DEBUG] [Yapi]: PUT on /agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/plugins/49c652d1-f8bc-4220-ae20-53526e4fef09/info : {"uuid":"49c652d1-f8bc-4220-ae20-53526e4fef09","name":"linux","version":1,"type":"os","status":"running","description":"linux os plugin","configuration":{"ylocator":"tcp/127.0.0.1:7447","nodeid":"b53b9540-1a5f-4767-9dfd-6bf9b79e8435","expose":true,"update_interval":5}}
fagent: [WARNING] [Yapi]: PUT on /agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/plugins/49c652d1-f8bc-4220-ae20-53526e4fef09/info : 7b2275756964223a2234396336353264312d663862632d343232302d616532302d353335323665346665663039222c226e616d65223a226c696e7578222c2276657273696f6e223a312c2274797065223a226f73222c22737461747573223a2272756e6e696e67222c226465736372697074696f6e223a226c696e7578206f7320706c7567696e222c22636f6e66696775726174696f6e223a7b22796c6f6361746f72223a227463702f3132372e302e302e313a37343437222c226e6f64656964223a2262353362393534302d316135662d343736372d396466642d366266396237396538343335222c226578706f7365223a747275652c227570646174655f696e74657276616c223a357d7d
fagent: [DEBUG] Writing WriteData
fagent: [DEBUG] >>> Received message of 389 bytes
.......................................................

I see the error "Missing record field intf_configuration" but it seams not to break the plugin's resources registration as they are further used by the linuxbridge and LXD plugins successfully. Is that correct or it may be causing the issue?

Maybe I have misconfigured something? (node id is b53b9540-1a5f-4767-9dfd-6bf9b79e8435) /etc/fos/agent.json

{
    "agent": {
        "system": "0",
        "pid_file": "/var/fos/agent.pid",
        "expose": true,
        "yaks": "tcp/127.0.0.1:7447",
        "path": "/var/fos",
        "enable_lldp": false,
        "enable_spawner": false,
        "mgmt_interface": "enp0s3",
        "lldp_conf": "/etc/fos/lldpd/lldpd.conf"
    },
    "plugins": {
        "plugin_path": "/etc/fos/plugins",
        "autoload": false
    }
}

/etc/fos/plugins/linux/linux_plugin.json

{
  "name": "linux",
  "version": 1,
  "uuid": "49c652d1-f8bc-4220-ae20-53526e4fef09",
  "type": "os",
  "description": "linux os plugin",
  "configuration": {
    "ylocator": "tcp/127.0.0.1:7447",
    "nodeid": "b53b9540-1a5f-4767-9dfd-6bf9b79e8435",
    "expose": true,
    "update_interval": 5
  }
}

/etc/fos/plugins/linuxbridge/linuxbridge_plugin.json

{
  "name": "linuxbridge",
  "version": 1,
  "uuid": "d42b4163-af35-423a-acb4-a228290cf0be",
  "type": "network",
  "requirements": [
    "jinja2"
  ],
  "description": "linux Bridge network plugin",
  "configuration": {
    "ylocator": "tcp/127.0.0.1:7447",
    "nodeid": "b53b9540-1a5f-4767-9dfd-6bf9b79e8435",
    "dataplane_interface": "enp0s3",
    "use_vlan": false,
    "vlan_interface": "ens2",
    "vlan_range": [
      50,
      100
    ]
  }
}

/etc/fos/plugins/LXD/LXD_plugin.json

{
  "name": "LXD",
  "version": 1,
  "uuid": "892ae9fe-9d1b-4dc6-87a3-66f5695a9971",
  "type": "runtime",
  "requirements": [
    "pylxd",
    "jinja2",
    "packaging"
  ],
  "description": "lxd fdu plugin",
  "configuration": {
    "storage_pool": "default",
    "ylocator": "tcp/127.0.0.1:7447",
    "nodeid": "b53b9540-1a5f-4767-9dfd-6bf9b79e8435",
    "update_interval": 5
  }
}

where the nodeid is already updated accordingly and the sudoers user group (where fos is added) as well as fos itself are granted NOPASSWD:ALL .

As far as I managed to inspect the communication looking into the yaks's and the agent's dumps, the calls for initializing the linux bridge and LXD plugins are successfully executed using the agent's linux os plugin, so the communication routing seams to be fine as well as the agent. So event though I can see the node is functioning I'm not able to deploy any kind of FDUs on this node as its id is missing FIMAPI-wise.

It would be great if you could give me some guidance on where to search and investigate deeper for the causes of the problem. If any further information/logs/dumps would be helpful, just name it.

gabrik commented 4 years ago

Hi @tinchouuu

I guess that error comes from the Linux plugin when it tries to push the node information to YAKS thought the agent.

Can you check the output of

$ python3
>>> from yaks import Yaks
>>> y = Yaks.login('')
>>> ws = Yaks.workspace('/')
>>> ws.get('/agfos/0/tenants/0/nodes/**')
.....
>>> ws.get('/alfos/**')

Using the YAKS API you can check what it is going through YAKS, and this helps a lot the debugging. In particular with the first get you retrieve the global information on the nodes, this is the same get used by the APIs, while the second inspects the local information, so in this case we should get also the node information

tinchouuu commented 4 years ago

Hi @gabrik Thank you for the quick response! The script turned out to be a great help in locating the issue as you suggested!

I've executed the script and the result is:

Python 3.7.3 (default, Apr  3 2019, 05:39:12) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from yaks import Yaks
>>> y = Yaks.login('')
>>> ws = y.workspace('/')
>>> ws.get('/agfos/0/tenants/0/nodes/**')
[('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/status', {"uuid":"b53b9540-1a5f-4767-9dfd-6bf9b79e8435","ram":{"total":7978.7890625,"free":6580.203125},"disk":[{"mount_point":"/","total":31.928390502929688,"free":23.2158203125},{"mount_point":"/snap/lxd/12224","total":0.053466796875,"free":0.0},{"mount_point":"/snap/core/7917","total":0.0870361328125,"free":0.0}],"neighbors":[]}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/plugins/d42b4163-af35-423a-acb4-a228290cf0be/info', {"uuid":"d42b4163-af35-423a-acb4-a228290cf0be","name":"linuxbridge","version":1,"type":"network","status":"running","requirements":["jinja2"],"description":"linux Bridge network plugin","configuration":{"ylocator":"tcp/127.0.0.1:7447","nodeid":"b53b9540-1a5f-4767-9dfd-6bf9b79e8435","dataplane_interface":"enp0s3","use_vlan":false,"vlan_interface":"ens2","vlan_range":[50,100]}}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/plugins/892ae9fe-9d1b-4dc6-87a3-66f5695a9971/info', {"uuid":"892ae9fe-9d1b-4dc6-87a3-66f5695a9971","name":"LXD","version":1,"type":"runtime","status":"running","requirements":["pylxd","jinja2","packaging"],"description":"lxd fdu plugin","configuration":{"storage_pool":"default","ylocator":"tcp/127.0.0.1:7447","nodeid":"b53b9540-1a5f-4767-9dfd-6bf9b79e8435","update_interval":5}}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/plugins/49c652d1-f8bc-4220-ae20-53526e4fef09/info', {"uuid":"49c652d1-f8bc-4220-ae20-53526e4fef09","name":"linux","version":1,"type":"os","status":"running","description":"linux os plugin","configuration":{"ylocator":"tcp/127.0.0.1:7447","nodeid":"b53b9540-1a5f-4767-9dfd-6bf9b79e8435","expose":true,"update_interval":5}}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/networks/floating-ips/6e45ba33-986f-44e8-b8f6-9c2e348ea85a/info', {"uuid":"6e45ba33-986f-44e8-b8f6-9c2e348ea85a","ip_version":"IPV4","address":"192.168.1.107/24"}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/configuration', {"agent":{"system":"0","uuid":"b53b9540-1a5f-4767-9dfd-6bf9b79e8435","expose":true,"yaks":"tcp/127.0.0.1:7447","path":"/var/fos","enable_lldp":false,"enable_spawner":false,"pid_file":"/var/fos/agent.pid","mgmt_interface":"enp0s3","lldp_conf":"/etc/fos/lldpd/lldpd.conf"},"plugins":{"plugin_path":"/etc/fos/plugins","autoload":false}}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/agent/exec/create_node_network', {"error":11,"error_msg":"(Failure \"Can't get value out of None\")"}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/agent/exec/remove_node_netwotk', {"error":11,"error_msg":"(Failure \"Can't get value out of None\")"}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/agent/exec/create_floating_ip', {"result":{"uuid":"324e6f36-b459-47b7-869b-ff09d0554981","ip_version":"IPV4","address":"192.168.1.108/24"}}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/agent/exec/delete_floating_ip', {"error":22,"error_msg":"(Failure \"Can't get value out of None\")"}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/agent/exec/assign_floating_ip', {"error":33,"error_msg":"(Failure \"Can't get value out of None\")"}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/agent/exec/remove_floating_ip', {"error":33,"error_msg":"(Failure \"Can't get value out of None\")"}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/agent/exec/add_router_port', {"error":22,"error_msg":"(Failure \"Can't get value out of None\")"}), ('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/agent/exec/remove_router_port', {"error":22,"error_msg":"(Failure \"Can't get value out of None\")"})]
>>> 

The second part of the script (ws.get('/alfos/**')) got really stuck and I gave up waiting for an answer after about 5 minutes. I guess this is due to the node's networking issues seen in the results of ws.get('/agfos/0/tenants/0/nodes/**').

Overall, I tried the first ws.get two times and the second one - once.

I checked the dumps from the Linux plugin:

[2019-11-04 05:26:27,169] - [INFO] > < execute_command() > OS Plugin executing command sudo ip link add fl-6e45ba33 link enp0s3 type macvlan mode vepa
[2019-11-04 05:26:27,185] - [INFO] > < execute_command() > OS Plugin executing command sudo dhclient fl-6e45ba33
[2019-11-04 05:35:03,399] - [INFO] > < execute_command() > OS Plugin executing command sudo ip link add fl-324e6f36 link enp0s3 type macvlan mode vepa
[2019-11-04 05:35:03,414] - [INFO] > < execute_command() > OS Plugin executing command sudo dhclient fl-324e6f36
[2019-11-04 05:38:33,978] - [INFO] > < local_mgmt_address() > Entering function...
[2019-11-04 05:38:33,979] - [INFO] > < local_mgmt_address() > MGMT Interface is enp0s3
[2019-11-04 05:38:33,994] - [INFO] > < execute_command() > OS Plugin executing command sudo ip link add fl-65ed81a1 link enp0s3 type macvlan mode vepa
[2019-11-04 05:38:34,058] - [INFO] > < execute_command() > OS Plugin executing command sudo dhclient fl-65ed81a1

and the ones from the Linux bridge plugin:

[2019-11-04 05:26:27,167] - [INFO] > < create_floating_ip() > Creating new floating IP
[2019-11-04 05:26:28,541] - [INFO] > < create_floating_ip() >  [ DONE ] New floating IP created {'uuid': '6e45ba33-986f-44e8-b8f6-9c2e348ea85a', 'ip_version': 'IPV4', 'address': '192.168.1.107/24', 'face': 'enp0s3', 'vface': 'fl-6e45ba33', 'cp_id': '', 'router_id': ''}
[2019-11-04 05:35:03,398] - [INFO] > < create_floating_ip() > Creating new floating IP
[2019-11-04 05:35:05,721] - [INFO] > < create_floating_ip() >  [ DONE ] New floating IP created {'uuid': '324e6f36-b459-47b7-869b-ff09d0554981', 'ip_version': 'IPV4', 'address': '192.168.1.108/24', 'face': 'enp0s3', 'vface': 'fl-324e6f36', 'cp_id': '', 'router_id': ''}
[2019-11-04 05:38:33,987] - [INFO] > < create_floating_ip() > Creating new floating IP
[2019-11-04 05:38:34,809] - [INFO] > < create_floating_ip() >  [ DONE ] New floating IP created {'uuid': '65ed81a1-629a-4dc4-a319-941fe1fddafd', 'ip_version': 'IPV4', 'address': '192.168.1.109/24', 'face': 'enp0s3', 'vface': 'fl-65ed81a1', 'cp_id': '', 'router_id': ''}

And here are the actual interfaces created on the host:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 08:00:27:c4:d0:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.104/24 brd 192.168.1.255 scope global dynamic noprefixroute enp0s3
       valid_lft 6054sec preferred_lft 6054sec
    inet6 fe80::a00:27ff:fec4:d0c0/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: lxdbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 02:55:c0:99:b4:f6 brd ff:ff:ff:ff:ff:ff
    inet 10.40.169.1/24 scope global lxdbr0
       valid_lft forever preferred_lft forever
    inet6 fd42:b40d:b45b:72b::1/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::55:c0ff:fe99:b4f6/64 scope link 
       valid_lft forever preferred_lft forever
4: fl-6e45ba33@enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8a:98:90:55:20:14 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.107/24 brd 192.168.1.255 scope global dynamic fl-6e45ba33
       valid_lft 6382sec preferred_lft 6382sec
    inet6 fe80::8898:90ff:fe55:2014/64 scope link 
       valid_lft forever preferred_lft forever
5: fl-324e6f36@enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:75:a5:3b:06:13 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.108/24 brd 192.168.1.255 scope global dynamic fl-324e6f36
       valid_lft 6899sec preferred_lft 6899sec
    inet6 fe80::75:a5ff:fe3b:613/64 scope link 
       valid_lft forever preferred_lft forever
6: fl-65ed81a1@enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 9a:ec:9a:54:8a:c1 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.109/24 brd 192.168.1.255 scope global dynamic fl-65ed81a1
       valid_lft 7108sec preferred_lft 7108sec
    inet6 fe80::98ec:9aff:fe54:8ac1/64 scope link 
       valid_lft forever preferred_lft forever

I must admit networking is not something I'm familiar with in depth but it seams that the problem actually lies somewhere there. I hope the dumps would be helpful for finding out the true cause.

Any further help would be appreciated! Thanks in advance!

gabrik commented 4 years ago

Hi @tinchouuu

From the first output I can see that the plugin was pushing the node status but not the info, so this is why you do not see it using the APIs.

The second one I actually made a mistake, sorry for that, the correct command should be:

>>> ws.get('/alfos/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/info')

That one was blocking, because it was triggering tons of RPC, so I suggest that you restart everything and try with this one

tinchouuu commented 4 years ago

OK, all done and restarted and the result is as follows:

>>> from yaks import Yaks
>>> y = Yaks.login('')
>>> ws = y.workspace('/')
>>> ws.get('/alfos/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/info')
[('/alfos/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/info', {"uuid": "b53b9540-1a5f-4767-9dfd-6bf9b79e8435", "name": "fog05-debian", "os": "linux", "cpu": [{"model": "Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz", "frequency": 0.0, "arch": "x86_64"}, {"model": "Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz", "frequency": 0.0, "arch": "x86_64"}], "ram": {"size": 7978.7890625}, "disks": [{"local_address": "/dev/sda1", "dimension": 31.928390502929688, "mount_point": "/", "filesystem": "ext4"}, {"local_address": "/dev/loop1", "dimension": 0.053466796875, "mount_point": "/snap/lxd/12224", "filesystem": "squashfs"}, {"local_address": "/dev/loop0", "dimension": 0.0870361328125, "mount_point": "/snap/core/7917", "filesystem": "squashfs"}], "network": [{"intf_name": "enp0s3", "intf_configuration": {"ipv4_address": "192.168.1.104", "ipv4_netmask": "255.255.255.0", "ipv4_gateway": "192.168.1.1", "ipv6_address": "fe80::a00:27ff:fec4:d0c0%enp0s3", "ipv6_netmask": "ffff:ffff:ffff:ffff::"}, "intf_mac_address": "08:00:27:c4:d0:c0", "intf_speed": 1000, "type": "ethernet", "available": false, "default_gw": true}, {"intf_name": "lxdbr0", "intf_configuration": {"ipv4_address": "10.40.169.1", "ipv4_netmask": "255.255.255.0", "ipv4_gateway": "", "ipv6_address": "fd42:b40d:b45b:72b::1", "ipv6_netmask": "ffff:ffff:ffff:ffff::"}, "intf_mac_address": "1e:0d:73:a4:52:03", "intf_speed": 0, "type": "container bridge", "available": true, "default_gw": false}, {"intf_name": "lo", "intf_configuration": {"ipv4_address": "127.0.0.1", "ipv4_netmask": "255.0.0.0", "ipv4_gateway": "", "ipv6_address": "::1", "ipv6_netmask": "ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff"}, "intf_mac_address": "00:00:00:00:00:00", "intf_speed": 0, "type": "loopback", "available": true, "default_gw": false}], "io": [], "accelerator": []})]
>>> 

I tried listing the nodes again but no luck:

>>> from fog05 import FIMAPI
>>> api=FIMAPI(locator='127.0.0.1')
>>> api.node.list()
[]
>>> api.close()
>>> exit(0)

Looking at Yaks after performing the same api call manually: ws.get('/agfos/0/tenants/0/nodes/*/info'):

[1572869616.690101][DEBUG] Reading Query
[1572869616.690276][DEBUG] Handling Query Message. nid[UNKNOWN] sid[10] pid[4014b1e68aef5c13] qid[0] res[/agfos/0/tenants/0/nodes/*/info]
[1572869616.690357][DEBUG] Forwarding 0 replies to session 10
gabrik commented 4 years ago

Thanks @tinchouuu

Indeed as I was thinking there is an error in reporting the node information.

I'm going to reproduce this behavior and work on a fix.

Thanks for reporting this!

I'll keep this issue updated and link them to the PR that will solve this

tinchouuu commented 4 years ago

Thanks @gabrik ! Looking forward to trying the fixed version! If any further details are needed for reproducing the environment/behavior, I would be glad to help.

gabrik commented 4 years ago

Hi @tinchouuu I have done some testing, and I guess the binary downloaded by the installation is not the correct one, can you check using this one?

https://www.dropbox.com/s/2n8rlf3xl6dcrwi/agent.tar.gz

You can just replace /etc/fos/agent

tinchouuu commented 4 years ago

This seams to fix the issue! Thank you very much! I've checked both the public and private stores and the information seams to be populated correctly right now:

>>> from yaks import Yaks
>>> y = Yaks.login('')
>>> ws = y.workspace('/')
>>> ws.get('/agfos/0/tenants/0/nodes/*/info')
[('/agfos/0/tenants/0/nodes/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/info', {"uuid":"b53b9540-1a5f-4767-9dfd-6bf9b79e8435","name":"fog05-debian","os":"linux","cpu":[{"model":"Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz","frequency":0.0,"arch":"x86_64"},{"model":"Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz","frequency":0.0,"arch":"x86_64"}],"ram":{"size":7978.7890625},"disks":[{"local_address":"/dev/sda1","dimension":31.928390502929688,"mount_point":"/","filesystem":"ext4"},{"local_address":"/dev/loop1","dimension":0.053466796875,"mount_point":"/snap/lxd/12224","filesystem":"squashfs"},{"local_address":"/dev/loop0","dimension":0.0870361328125,"mount_point":"/snap/core/7917","filesystem":"squashfs"}],"io":[],"accelerator":[],"network":[{"intf_name":"enp0s3","intf_mac_address":"08:00:27:c4:d0:c0","intf_speed":1000,"type":"ethernet","available":false,"default_gw":true,"intf_configuration":{"ipv4_address":"192.168.1.104","ipv4_netmask":"255.255.255.0","ipv4_gateway":"192.168.1.1","ipv6_address":"fe80::a00:27ff:fec4:d0c0%enp0s3","ipv6_netmask":"ffff:ffff:ffff:ffff::"}},{"intf_name":"lxdbr0","intf_mac_address":"1e:0d:73:a4:52:03","intf_speed":0,"type":"container bridge","available":true,"default_gw":false,"intf_configuration":{"ipv4_address":"10.40.169.1","ipv4_netmask":"255.255.255.0","ipv4_gateway":"","ipv6_address":"fd42:b40d:b45b:72b::1","ipv6_netmask":"ffff:ffff:ffff:ffff::"}},{"intf_name":"lo","intf_mac_address":"00:00:00:00:00:00","intf_speed":0,"type":"loopback","available":true,"default_gw":false,"intf_configuration":{"ipv4_address":"127.0.0.1","ipv4_netmask":"255.0.0.0","ipv4_gateway":"","ipv6_address":"::1","ipv6_netmask":"ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff"}}]})]
>>> ws.get('/alfos/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/info')
[('/alfos/b53b9540-1a5f-4767-9dfd-6bf9b79e8435/info', {"uuid": "b53b9540-1a5f-4767-9dfd-6bf9b79e8435", "name": "fog05-debian", "os": "linux", "cpu": [{"model": "Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz", "frequency": 0.0, "arch": "x86_64"}, {"model": "Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz", "frequency": 0.0, "arch": "x86_64"}], "ram": {"size": 7978.7890625}, "disks": [{"local_address": "/dev/sda1", "dimension": 31.928390502929688, "mount_point": "/", "filesystem": "ext4"}, {"local_address": "/dev/loop1", "dimension": 0.053466796875, "mount_point": "/snap/lxd/12224", "filesystem": "squashfs"}, {"local_address": "/dev/loop0", "dimension": 0.0870361328125, "mount_point": "/snap/core/7917", "filesystem": "squashfs"}], "network": [{"intf_name": "enp0s3", "intf_configuration": {"ipv4_address": "192.168.1.104", "ipv4_netmask": "255.255.255.0", "ipv4_gateway": "192.168.1.1", "ipv6_address": "fe80::a00:27ff:fec4:d0c0%enp0s3", "ipv6_netmask": "ffff:ffff:ffff:ffff::"}, "intf_mac_address": "08:00:27:c4:d0:c0", "intf_speed": 1000, "type": "ethernet", "available": false, "default_gw": true}, {"intf_name": "lxdbr0", "intf_configuration": {"ipv4_address": "10.40.169.1", "ipv4_netmask": "255.255.255.0", "ipv4_gateway": "", "ipv6_address": "fd42:b40d:b45b:72b::1", "ipv6_netmask": "ffff:ffff:ffff:ffff::"}, "intf_mac_address": "1e:0d:73:a4:52:03", "intf_speed": 0, "type": "container bridge", "available": true, "default_gw": false}, {"intf_name": "lo", "intf_configuration": {"ipv4_address": "127.0.0.1", "ipv4_netmask": "255.0.0.0", "ipv4_gateway": "", "ipv6_address": "::1", "ipv6_netmask": "ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff"}, "intf_mac_address": "00:00:00:00:00:00", "intf_speed": 0, "type": "loopback", "available": true, "default_gw": false}], "io": [], "accelerator": []})]
>>> 

Thus, the node id is returned by the api call. Thank you very much for your help and cooperation along the way.

I just have one question: Is there a repo where I can track and update my setup acordingly with the latest binaries of fog05's artifacts and dependencies for the different architectures - x86_64, arm, etc.?

I guess the best way to make sure that everything is running with the proper binaries is to build them all locally on the deployment host but sometimes it can be a bugger. I've tried the same setup on a RPi 3 (Raspbian Stretch) and had gone through some fun stuff to get fog05 to start - e.g. had to clone the zenoh-c lib and cmake it locally. I got it running but still it is not really functional as the plugins somehow do not communicate properly. So maybe the binaries I use there are also out of date? It would be great if I could just check, update, retest and only after that post an issue if the problem is still present. Offloading you from unnecessary issues here :)

Thank you again for helping me solve this issue!

gabrik commented 4 years ago

Glad that worked @tinchouuu!

Indeed the binaries are out of date, just forgot to update them after a PR.

You are absolutely right we miss a build guide that should prevent this kind of issues.

I'll keep this issue open to keep track and provide a build guide and the checksum for the binaries in the install guide, so that it is easy to keep track of the binaries or to build them.

Thanks a lot

If you experience any other issue related to API/functionalities just open a new issue :)

PS. Just for information we are going to reorganize the repo and split the components in the different repos of the https://github.com/eclipse-fog05/ organization

tinchouuu commented 4 years ago

@gabrik The action plan with the build guide and checksums sounds great! It will really give a cleaner, safer and straightforward approach to building and deploying fog05.

Thank you for the heads-up about the repo reorganization!

I will play around with the RPi deployment for a bit more and if the issue is still present, I will open a new one for it.

Thanks again!

gabrik commented 4 years ago

Hi @tinchouuu, you can follow the branch in the PR #160, in the INSTALL.md you can find the md5sum for the agent binaries.

I have also updated the binaries referenced by the install in script. So now everything should be up to date. Still I'll keep this open once I can merge also the PR with the correct build guide

tinchouuu commented 4 years ago

Hi @gabrik ! Sounds great - I'll give the updated install script a run today! Looking forward to trying the build guide too.

Thank you very much for all the updates!

gabrik commented 4 years ago

Hi @tinchouuu, just merged #160, with the build guide

I'm closing this issue.