emqx / neuron

Open source industrial IoT connectivity server
https://neugates.io
GNU Lesser General Public License v3.0
1.01k stars 218 forks source link

Neuron crashes when I create number of nodes (160 ~ 200). #1566

Open itwasme-ulrich opened 1 year ago

itwasme-ulrich commented 1 year ago

Firstly, I installed Neuron by downloading the neuron-2.5.2-linux-amd64-en.deb package from the official website. Then, I added Modbus-TCP Node Plugins using a Template via both the Post API and manually on the dashboard. When creating about 165~168 Nodes, Neuron started crashing, and I couldn't access Neuron again through any means, even after completely pkill and restarting it. If I want to use Neuron, I'm forced to delete all data and reinstall it. The last lines of the neuron.log don't seem to provide any useful information.:

_2023-09-22 12:34:10 [ERROR] storage.c:128 load node164 setting fail 2023-09-22 12:34:10 [NOTICE] event_linux.c:256 add io, fd: 1007, epoll: 1006, ret: 0 2023-09-22 12:34:10 [NOTICE] event_linux.c:213 add timer, second: 30, millisecond: 0, timer: 1010 in epoll 1006, ret: 0 2023-09-22 12:34:10 [NOTICE] adapter.c:198 Success to create adapter: node164 2023-09-22 12:34:10 [NOTICE] storage.c:371 load adapter success type:1, name:node164 plugin:Modbus TCP state:1 2023-09-22 12:34:10 [NOTICE] manager.c:283 bind node node164 to pipe(681851086) 2023-09-22 12:34:10 [WARN] persist.c:1257 SQL SELECT setting FROM settings WHERE node_name=? with node165 fail: no more rows available 2023-09-22 12:34:10 [ERROR] storage.c:128 load node165 setting fail 2023-09-22 12:34:10 [NOTICE] event_linux.c:256 add io, fd: 1013, epoll: 1012, ret: 0 2023-09-22 12:34:10 [NOTICE] event_linux.c:213 add timer, second: 30, millisecond: 0, timer: 1016 in epoll 1012, ret: 0 2023-09-22 12:34:10 [NOTICE] adapter.c:198 Success to create adapter: node165 2023-09-22 12:34:10 [NOTICE] storage.c:371 load adapter success type:1, name:node165 plugin:Modbus TCP state:1 2023-09-22 12:34:10 [NOTICE] manager.c:283 bind node node165 to pipe(681851089) 2023-09-22 12:34:10 [WARN] persist.c:1257 SQL SELECT setting FROM settings WHERE node_name=? with node166 fail: no more rows available 2023-09-22 12:34:10 [ERROR] storage.c:128 load node166 setting fail 2023-09-22 12:34:10 [NOTICE] event_linux.c:256 add io, fd: 1018, epoll: 1019, ret: 0 2023-09-22 12:34:10 [NOTICE] eventlinux.c:213 add timer, second: 30, millisecond: 0, timer: 1022 in epoll 1019, ret: 0 2023-09-22 12:34:10 [NOTICE] adapter.c:198 Success to create adapter: node166 2023-09-22 12:34:10 [NOTICE] storage.c:371 load adapter success type:1, name:node166 plugin:Modbus TCP state:1

I'm assuming the issue lies with the database, so I used 'Build From Source' from the GitHub repository. With Neuron version 2.6.0, I created nodes using the same method mentioned earlier, and the number of nodes stopped at 200, then it crashed completely. Here are the last lines of the neuron.log. :

_2023-09-26 12:43:40:759 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with node198 fail: no more rows available 2023-09-26 12:43:40:759 [WARN] storage.c:128 load node198 setting fail 2023-09-26 12:43:40:759 [NOTICE] event_linux.c:293 add io, fd: 1011, epoll: 1010, ret: 0 2023-09-26 12:43:40:759 [NOTICE] manager_internal.c:590 instantiate tmpl:tcp-template node:node198 success 2023-09-26 12:43:40:760 [NOTICE] manager.c:257 bind node node198 to pipe(106500235) 2023-09-26 12:43:40:760 [NOTICE] http.c:54 <0x6150000a0800> POST /api/v2/template/inst [200] 2023-09-26 12:43:40:828 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1&node=&plugin= [200] 2023-09-26 12:43:40:929 [NOTICE] http.c:54 <0x615000010700> GET /api/v2/plugin [200] 2023-09-26 12:43:40:931 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1 [200] 2023-09-26 12:43:40:946 [NOTICE] http.c:54 <0x615000010700> GET /api/v2/node/setting?node=node198 [200] 2023-09-26 12:43:40:946 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/schema?schema_name=modbus-tcp [200] 2023-09-26 12:43:43:016 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/plugin [200] 2023-09-26 12:43:43:017 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1&node=&plugin= [200] 2023-09-26 12:43:43:032 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node/state [200] 2023-09-26 12:43:43:733 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/plugin [200] 2023-09-26 12:43:48:260 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/template [200] 2023-09-26 12:43:51:840 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with node199 fail: no more rows available 2023-09-26 12:43:51:840 [WARN] storage.c:128 load node199 setting fail 2023-09-26 12:43:51:840 [NOTICE] event_linux.c:293 add io, fd: 1016, epoll: 1015, ret: 0 2023-09-26 12:43:51:840 [NOTICE] manager_internal.c:590 instantiate tmpl:tcp-template node:node199 success 2023-09-26 12:43:51:842 [NOTICE] manager.c:257 bind node node199 to pipe(106500237) 2023-09-26 12:43:51:842 [NOTICE] http.c:54 <0x6150000a0800> POST /api/v2/template/inst [200] 2023-09-26 12:43:51:901 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1&node=&plugin= [200] 2023-09-26 12:43:51:969 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1 [200] 2023-09-26 12:43:51:972 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/plugin [200] 2023-09-26 12:43:51:990 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/schema?schemaname=modbus-tcp [200] 2023-09-26 12:43:51:991 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/node/setting?node=node199 [200] 2023-09-26 12:43:54:058 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/node?type=1&node=&plugin= [200] 2023-09-26 12:43:54:058 [NOTICE] http.c:54 <0x615000010700> GET /api/v2/plugin [200] 2023-09-26 12:43:54:072 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/node/state [200] 2023-09-26 12:43:55:053 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/plugin [200] 2023-09-26 12:43:57:818 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/template [200]

I also encounter a similar issue when using it on Raspberry Pi 3B+.

On the Docs page, I can see performance tests with up to 1000 nodes, so I'm wondering if you have any solutions to this problem.

fengzeroz commented 1 year ago

This issue may have been fixed in the 2.6 branch, and can be tested with the latest code from the 2.6 branch. There may be a problem with nng pair1, which drops the message.

baobuibk commented 1 year ago

Hi,

In version 2.6, is there a limit of 30 tag for modbus TCP?

fengzeroz commented 1 year ago

Yes, the release package Modbus RTU/TCP will also be limited by license, you can compile it yourself using the source code.

itwasme-ulrich commented 1 year ago

I 've just git clone -b v2.6, it seems like it hasn't been fixed yet. First, I noticed that the test results are quite similar to the compiled version 2.5.2*** that I downloaded from neugates.io/downloads?os=Linux. This means that I can only create up to Node 167, and Node 168 will crash." From what I've observed, every time I run ./neuron, it reloads everything in the database - `/neuron/build/persistence/sqlite.db` . So, I created a prepopulated database similar to the ones Neuron generates during operation. This time, I stored around 170 Nodes, and here are the log results when running it.

_2023-10-09 18:23:14:835 [NOTICE] adapter.c:252 Success to create adapter: node165, fd: 1002 2023-10-09 18:23:14:836 [NOTICE] storage.c:383 load adapter success type:1, name:node165 plugin:Modbus TCP state:1 2023-10-09 18:23:14:836 [NOTICE] manager.c:278 bind node node165 to src addr(49823) 2023-10-09 18:23:14:836 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with node166 fail: no more rows available 2023-10-09 18:23:14:836 [WARN] storage.c:128 load node166 setting fail 2023-10-09 18:23:14:836 [NOTICE] event_linux.c:293 add io, fd: 1008, epoll: 1010, ret: 0 2023-10-09 18:23:14:836 [NOTICE] event_linux.c:245 add timer, second: 30, millisecond: 0, timer: 1012 in epoll 1010, ret: 0 2023-10-09 18:23:14:836 [NOTICE] adapter.c:252 Success to create adapter: node166, fd: 1008 2023-10-09 18:23:14:836 [NOTICE] storage.c:383 load adapter success type:1, name:node166 plugin:Modbus TCP state:1 2023-10-09 18:23:14:836 [NOTICE] manager.c:278 bind node node166 to src addr(51087) 2023-10-09 18:23:14:836 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with node167 fail: no more rows available 2023-10-09 18:23:14:836 [WARN] storage.c:128 load node167 setting fail 2023-10-09 18:23:14:836 [NOTICE] event_linux.c:293 add io, fd: 1014, epoll: 1016, ret: 0 2023-10-09 18:23:14:836 [NOTICE] event_linux.c:245 add timer, second: 30, millisecond: 0, timer: 1018 in epoll 1016, ret: 0 2023-10-09 18:23:14:836 [NOTICE] adapter.c:252 Success to create adapter: node167, fd: 1014 2023-10-09 18:23:14:836 [NOTICE] storage.c:383 load adapter success type:1, name:node167 plugin:Modbus TCP state:1 2023-10-09 18:23:14:836 [NOTICE] manager.c:278 bind node node167 to src addr(57048) 2023-10-09 18:23:31:856 [NOTICE] rest.c:150 Uninitialize plugin: neuron-default-dashboard 2023-10-09 18:23:31:856 [NOTICE] event_linux.c:305 del io: 10 from epoll: 12 2023-10-09 18:23:31:856 [NOTICE] event_linux.c:257 del timer: 16 from epoll: 12 2023-10-09 18:23:31:856 [NOTICE] adapter.c:1228 Stop the adapter(default-dashboard) 2023-10-09 18:23:31:856 [NOTICE] adapter.c:1087 default-dashboard send uninit msg to manager failed 2023-10-09 18:23:31:856 [NOTICE] adapter.c:1179 adapter default-dashboard destroy 2023-10-09 18:23:31:856 [NOTICE] rest.c:127 Success to free plugin: neuron-default-dashboard 2023-10-09 18:23:31:857 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with default-dashboard fail: no more rows available 2023-10-09 18:23:31:857 [WARN] storage.c:128 load default-dashboard setting fail 2023-10-09 18:23:31:857 [WARN] log.c:59 directory logs contains no log files of default-dashboard 2023-10-09 18:23:32:609 [WARN] main.c:54 recv sig: 2 2023-10-09 18:23:32:609 [NOTICE] event_linux.c:257 del timer: -1 from epoll: 8 2023-10-09 18:23:32:611 [WARN] main.c:54 recv sig: 6 2023-10-09 18:23:32:647 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1 2023-10-09 18:23:32:648 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1 2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1 2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1 2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1 2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1 2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1 2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1 2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1 2023-10-09 18:23:32:652 [NOTICE] event_linux.c:305 del io: 264 from epoll: 266 2023-10-09 18:23:32:652 [NOTICE] event_linux.c:305 del io: 96 from epoll: 98 2023-10-09 18:23:32:652 [NOTICE] eventlinux.c:305 del io: 204 from epoll: 206

As you can see, Neuron can only load up to Node 167, after which it cannot load anymore, resulting in a crash. In a manual approach, meaning adding Node 166 , Node 167 then Node 168 still leads to similar results. However, in these recent attempts, although Neuron still crashes, the issue of 200% CPU usage no longer occurs.

I also tried git clone -b main , and with this compilation, I could still only create up to 200 nodes. However, the difference is when I tried it with a pre-created database containing approximately 205 nodes. The log looks like this:

_2023-10-09 18:48:12:778 [WARN] [neuron] persist.c:1296 SQL SELECT setting FROM settings WHERE node_name=? with Node200 fail: no more rows available 2023-10-09 18:48:12:778 [WARN] [neuron] storage.c:128 load Node200 setting fail 2023-10-09 18:48:12:778 [NOTICE] [neuron] event_linux.c:293 add io, fd: 1012, epoll: 1014, ret: 0 2023-10-09 18:48:12:778 [NOTICE] [neuron] storage.c:383 load adapter success type:1, name:Node200 plugin:Modbus TCP state:1 2023-10-09 18:48:12:778 [NOTICE] [neuron] manager.c:253 bind node Node200 to src addr(46732) 2023-10-09 18:48:12:778 [NOTICE] [Node201] modbus_tcp.c:110 Node201 init success 2023-10-09 18:48:12:778 [WARN] [neuron] persist.c:1296 SQL SELECT setting FROM settings WHERE node_name=? with Node201 fail: no more rows available 2023-10-09 18:48:12:778 [WARN] [neuron] storage.c:128 load Node201 setting fail 2023-10-09 18:48:12:778 [NOTICE] [neuron] event_linux.c:293 add io, fd: 1017, epoll: 1019, ret: 0 2023-10-09 18:48:12:779 [NOTICE] [neuron] storage.c:383 load adapter success type:1, name:Node201 plugin:Modbus TCP state:1 2023-10-09 18:48:12:779 [NOTICE] [neuron] manager.c:253 bind node Node201 to src addr(52978) 2023-10-09 18:48:12:779 [WARN] [neuron] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 0 2023-10-09 18:48:12:779 [NOTICE] [node202] modbus_tcp.c:110 node202 init success 2023-10-09 18:48:12:779 [WARN] [neuron] persist.c:1296 SQL SELECT setting FROM settings WHERE node_name=? with node202 fail: no more rows available 2023-10-09 18:48:12:779 [WARN] [neuron] storage.c:128 load node202 setting fail 2023-10-09 18:48:12:779 [NOTICE] [neuron] event_linux.c:293 add io, fd: 1022, epoll: -1, ret: -1 2023-10-09 18:48:12:779 [WARN] [neuron] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 0 2023-10-09 18:48:12:779 [NOTICE] [neuron] storage.c:383 load adapter success type:1, name:node202 plugin:Modbus TCP state:1 2023-10-09 18:48:12:779 [NOTICE] [neuron] manager.c:253 bind node node202 to src addr(58539) 2023-10-09 18:48:12:779 [NOTICE] [neuron] storage.c:383 load adapter success type:1, name:node203 plugin:Modbus TCP state:1 2023-10-09 18:48:12:779 [WARN] [neuron] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 0 2023-10-09 18:48:12:779 [NOTICE] [neuron] eventlinux.c:245 add timer, second: 0, millisecond: 10, timer: -1 in epoll 8, ret: -1 2023-10-09 18:48:12:779 [NOTICE] [neuron] manager.c:155 manager start

I thinks epoll = -1 means that Neuron has stopped functioning logically.

In additional, all of the above means that if I delete the database down to Node 166 (v2.6) or Node 199 (main), Neuron can still start up / work normally."

fengzeroz commented 1 year ago

It is possible that there are not enough file descriptors, try to change the file descriptor limits. Use ulimit -n to view the current fd limit, /proc/$pid/fd directory to see the file descriptors used by processes.

eeff commented 1 year ago

It is possible that there are not enough file descriptors, try to change the file descriptor limits. Use ulimit -n to view the current fd limit, /proc/$pid/fd directory to see the file descriptors used by processes.

@itwasme-ulrich your problem is highly probable due to the file descriptor resource limit.