fledge-iot / fledge

An open source platform for the Industrial Internet of Things, it acts as an edge gateway between sensor devices and cloud storage systems.
https://www.lfedge.org/projects/fledge/
Apache License 2.0
124 stars 46 forks source link

Add config category failed connection refused #1353

Open stephenrichardson opened 6 months ago

stephenrichardson commented 6 months ago

I've been using Fledge for the past few weeks and have written my own south python plugin to get sensor data. This has been working and I can see readings. Today I setup a north plugin for Azure IOT Hub. This also worked and I could see messages arriving into Azure.

At some point (and I'm not sure what caused the issues) Fledge errored and stopped.

I deleted the north and south services and performed a fledge reset.

I am now at a point where I can start fledge and services show as running. I was able to recreate the North plugin to Azure (and I see messages arriving in Azure each time Fledge starts). I was able to recreate the South plugin for the custom plugin I created but it would go to unresponsive and eventually show as failed.

I've deleted my custom plugin services and the plugin files from the fledge directory. If I add the http_south plugin to the fledge directory, as soon as I go to browse the south plugins to create a service, fledge will error and stop immediately.

This is the error that is appearing most consistently:

May 7 20:44:33 firefly Fledge Storage[28515]: ERROR: Failed to register configuration category: {"error": {"message": "[AttributeError] 'NoneType' object has no attribute 'register'"}}.

These are some errors that are also occurring (I suspect due to the one above):

May 7 21:55:22 firefly Fledge Storage[21315]: FATAL: (1) 1 0x7f9c0566c0 __kernel_rt_sigreturn + 0---------
May 7 21:55:22 firefly Fledge Storage[21315]: FATAL: (0) 0 0x558b3b158c /usr/local/fledge/services/fledge.services.storage(+0x5958c) [0x558b3b158c]---------
May 7 21:55:22 firefly Fledge Storage[21315]: FATAL: Signal 6 (Aborted) trapped:
May 7 21:55:22 firefly Fledge Storage[21315]: ERROR: Add child categories failed Connection refused.
May 7 21:55:22 firefly Fledge Storage[21315]: ERROR: Add config category failed Connection refused.
May 7 21:55:04 firefly Fledge Storage[21315]: ERROR: Add config category failed Connection refused.
stephenrichardson commented 6 months ago

Since posting this, I reinstalled Fledge. I can see these errors in the log:

May 8 10:02:05 firefly Fledge Storage[15774]: ERROR: Failed to register configuration category: {"error": {"message": "[AttributeError] 'NoneType' object has no attribute 'register'"}}.

May 8 10:02:05 firefly Fledge Storage[15774]: ERROR: HTTP error while fetching configuration category for Storage: 404: No such Category found for Storage

May 8 10:02:05 firefly Fledge[15671] INFO: service_registry: fledge.services.core.service_registry.service_registry: Registered service instance id=bb4af805-7da5-475c-a6f7-8fee9e9626e8:

I reinstalled my custom plugin and created a south service. The service shows unresponsive and goes into a cycle of trying to start. These are the log messages each time:

May 8 10:19:23 firefly Fledge[15671] INFO: service_registry: fledge.services.core.service_registry.service_registry: Mark as failed service instance id=8e72e7b2-55a4-41d3-a089-1366031bbb17:
May 8 10:18:06 firefly Fledge BluVib203610[18154]: FATAL: (4) 4 0x7f8a339a34 /lib/ld-linux-aarch64.so.1(+0xda34) [0x7f8a339a34]---------
May 8 10:18:06 firefly Fledge BluVib203610[18154]: FATAL: (3) 3 0x7f69cdb72c gotoblas_init + 52---------
May 8 10:18:06 firefly Fledge BluVib203610[18154]: FATAL: (2) 2 0x7f69e58f54 gotoblas_dynamic_init + 500---------
May 8 10:18:06 firefly Fledge BluVib203610[18154]: FATAL: (1) 1 0x7f8a3586c0 __kernel_rt_sigreturn + 0---------
May 8 10:18:06 firefly Fledge BluVib203610[18154]: FATAL: (0) 0 0x5565a16e34 handler(int) + 76---------
May 8 10:18:06 firefly Fledge BluVib203610[18154]: FATAL: Signal 4 (Illegal instruction) trapped:
May 8 10:18:04 firefly Fledge[15671] INFO: service_registry: fledge.services.core.service_registry.service_registry: Registered service instance id=8e72e7b2-55a4-41d3-a089-1366031bbb17:
ashish-jabble commented 6 months ago

I deleted the north and south services and performed a fledge reset.

You don't necessarily delete manually stuff if fledge reset is used as it will take care automatically and instance is resetted with default configuration, but yes it will not reset the plugins directory!

This is the error that is appearing most consistently: May 7 20:44:33 firefly Fledge Storage[28515]: ERROR: Failed to register configuration category: {"error": {"message": "[AttributeError] 'NoneType' object has no attribute 'register'"}}.

This is known issue to us and error can be ignored as there were some race condition which needs to be handled gracefully but it will not impact any of the area.

I reinstalled Fledge.

Have you installed Fledge with make based or package based installation and on which platform architecture?

The service shows unresponsive and goes into a cycle of trying to start

As per logs seems like it's an issue with your plugin. Is it python based or C-based plugin?

Is it possible to share the support bundle of your instance? See how to get this here

stephenrichardson commented 6 months ago

I deleted the north and south services and performed a fledge reset.

I did this because the fledge service wouldn't startup again and doing this meant it could start.

Have you installed Fledge with make based or package based installation and on which platform architecture?

Make based: aarch64 Ubuntu 18.04

As per logs seems like it's an issue with your plugin. Is it python based or C-based plugin?

The plugin I wrote was a python one. But if I remove this one completely and just install the http_south one, as soon as I go to the "south" page on the Fledge GUI, Fledge will crash/stop.

Some system logs:

May 9 15:26:27 firefly Fledge Storage[23911]: FATAL: (2) 2 0x7f95c254f8 raise + 176---------

May 9 15:26:27 firefly Fledge Storage[23911]: FATAL: (1) 1 0x7f9638b6c0 __kernel_rt_sigreturn + 0---------

May 9 15:26:27 firefly Fledge Storage[23911]: FATAL: (0) 0 0x55856e30a4 /usr/local/fledge/services/fledge.services.storage(+0x590a4) [0x55856e30a4]---------

May 9 15:26:27 firefly Fledge Storage[23911]: FATAL: Signal 6 (Aborted) trapped:

May 9 15:26:27 firefly Fledge Storage[23911]: ERROR: Add child categories failed Connection refused.

May 9 15:26:27 firefly Fledge Storage[23911]: ERROR: Add config category failed Connection refused.

May 9 15:26:09 firefly Fledge Storage[23911]: ERROR: Add config category failed Connection refused.

May 9 15:25:53 firefly Fledge Storage[23911]: ERROR: Add config category failed Connection refused.

May 9 15:25:39 firefly Fledge Storage[23911]: ERROR: Add config category failed Connection refused.

May 9 15:25:27 firefly Fledge Storage[23911]: ERROR: Add config category failed Connection refused.

May 9 15:25:26 firefly Fledge [24035] INFO: script.fledge: Fledge started.

May 9 15:25:25 firefly Fledge [24389] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 9 15:25:25 firefly Fledge [24381] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 9 15:25:25 firefly Fledge [24369] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 9 15:25:25 firefly Fledge[24130] INFO: service_registry: fledge.services.core.service_registry.service_registry: Registered service instance id=b747e41d-af66-4593-b7de-0cb304cf959a:

May 9 15:25:23 firefly Fledge[24130] INFO: server: fledge.services.core.server: REST API Server started on http://0.0.0.0:8081

May 9 15:25:23 firefly Fledge[24130] INFO: server: fledge.services.core.server: PID [24130] written in [/usr/local/fledge/data/var/run/fledge.core.pid]

May 9 15:25:23 firefly Fledge[24130] WARNING: server: fledge.services.core.server: A Fledge PID file has been found: [/usr/local/fledge/data/var/run/fledge.core.pid] found, ignoring it.

May 9 15:25:22 firefly Fledge[24130] INFO: server: fledge.services.core.server: Announce management API service

May 9 15:25:22 firefly Fledge[24130] INFO: server: fledge.services.core.server: Services monitoring started ...

May 9 15:25:22 firefly Fledge[24130] INFO: server: fledge.services.core.server: Starting scheduler ...

May 9 15:25:20 firefly Fledge Storage[24228]: ERROR: Failed to register configuration category: {"error": {"message": "[AttributeError] 'NoneType' object has no attribute 'register'"}}.

May 9 15:25:18 firefly Fledge Storage[24228]: ERROR: Failed to register configuration category: {"error": {"message": "[AttributeError] 'NoneType' object has no attribute 'register'"}}.

May 9 15:25:18 firefly Fledge[24130] INFO: service_registry: fledge.services.core.service_registry.service_registry: Registered service instance id=c2a44a61-9156-4ad2-81c5-225a0fa213fc:

May 9 15:25:17 firefly Fledge Storage[23911]: ERROR: Add config category failed Connection refused.

May 9 15:25:17 firefly Fledge [24206] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 9 15:25:17 firefly Fledge [24206] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 9 15:25:17 firefly Fledge[24130] INFO: server: fledge.services.core.server: Start storage, from directory /usr/local/fledge/scripts

May 9 15:25:17 firefly Fledge[24130] INFO: server: fledge.services.core.server: Management API started on http://0.0.0.0:40597

May 9 15:25:17 firefly Fledge[24130] INFO: server: fledge.services.core.server: Starting ...

May 9 15:25:14 firefly Fledge Storage[24035] INFO: script.plugin.storage.sqlite: Fledge DB schema is up to date to version [70]

May 9 15:25:14 firefly Fledge Storage[24035] INFO: script.plugin.storage.sqlite: SQLite3 readings database is ready.

May 9 15:25:14 firefly Fledge Storage[24109] INFO: script.plugin.storage.sqlite: SQLite 3 database '/usr/local/fledge/data/readings_1.db' ready.

May 9 15:25:14 firefly Fledge Storage[24035] INFO: script.plugin.storage.sqlite: SQLite3 database is ready.

May 9 15:25:14 firefly Fledge Storage[24106] INFO: script.plugin.storage.sqlite: SQLite 3 database '/usr/local/fledge/data/fledge.db' ready.

May 9 15:25:14 firefly Fledge [24081] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 9 15:25:14 firefly Fledge [24068] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 9 15:25:09 firefly Fledge message repeated 2 times: [ Storage[23911]: ERROR: Add config category failed Connection refused.]

May 9 15:24:59 firefly Fledge Storage[23911]: ERROR: Add config category failed Connection refused.

May 9 15:24:58 firefly Fledge [23718] ERROR: script.fledge: Fledge cannot start.

May 9 15:24:57 firefly Fledge Storage[23911]: ERROR: Add config category failed Connection refused.

May 9 15:24:56 firefly Fledge[23813] INFO: service_registry: fledge.services.core.service_registry.service_registry: Registered service instance id=f259c247-d0f4-43ad-9df8-eb94bf11242b:

May 9 15:24:54 firefly Fledge[23813] INFO: server: fledge.services.core.server: REST API Server started on http://0.0.0.0:8081

May 9 15:24:54 firefly Fledge[23813] INFO: server: fledge.services.core.server: PID [23813] written in [/usr/local/fledge/data/var/run/fledge.core.pid]

May 9 15:24:54 firefly Fledge[23813] WARNING: server: fledge.services.core.server: A Fledge PID file has been found: [/usr/local/fledge/data/var/run/fledge.core.pid] found, ignoring it.

May 9 15:24:54 firefly Fledge[23813] INFO: server: fledge.services.core.server: Announce management API service

May 9 15:24:53 firefly Fledge[23813] INFO: server: fledge.services.core.server: Services monitoring started ...

May 9 15:24:53 firefly Fledge[23813] INFO: server: fledge.services.core.server: Starting scheduler ...

May 9 15:24:51 firefly Fledge Storage[23911]: ERROR: Failed to register configuration category: {"error": {"message": "[AttributeError] 'NoneType' object has no attribute 'register'"}}.

May 9 15:24:49 firefly Fledge Storage[23911]: ERROR: Failed to register configuration category: {"error": {"message": "[AttributeError] 'NoneType' object has no attribute 'register'"}}.
ashish-jabble commented 6 months ago

Seems like your instance is in bad state as Fledge cannot start appears in log. Would you mind to run commands in order? a) Kill all the fledge services - $FLEDGE_ROOT/bin/fledge kill b) Reset fledge - echo "YES" | $FLEDGE_ROOT/bin/fledge reset c) Start fledge - $FLEDGE_ROOT/bin/fledge start d) Now add service with your plugin and see if this works else we need the support bundle of your instance along python plugin code

stephenrichardson commented 6 months ago

The problem happens when I install the http_south plugin. When I go to south and then click Add, Fledge crashes.

These are the logs:

May 13 20:47:02 firefly Fledge Storage[3497]: ERROR: Failed to register configuration category: {"error": {"message": "[AttributeError] 'NoneType' object has no attribute 'register'"}}.

May 13 20:47:00 firefly Fledge Storage[3497]: ERROR: Failed to register configuration category: {"error": {"message": "[AttributeError] 'NoneType' object has no attribute 'register'"}}.

May 13 20:47:00 firefly Fledge[3399] INFO: service_registry: fledge.services.core.service_registry.service_registry: Registered service instance id=601bd37e-f806-47ad-a779-3fa815d23177:

May 13 20:46:59 firefly Fledge [3475] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 13 20:46:59 firefly Fledge [3475] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 13 20:46:59 firefly Fledge[3399] INFO: server: fledge.services.core.server: Start storage, from directory /usr/local/fledge/scripts

May 13 20:46:59 firefly Fledge[3399] INFO: server: fledge.services.core.server: Management API started on http://0.0.0.0:38992

May 13 20:46:59 firefly Fledge[3399] INFO: server: fledge.services.core.server: Starting ...

May 13 20:46:57 firefly Fledge Storage[3304] INFO: script.plugin.storage.sqlite: Fledge DB schema is up to date to version [70]

May 13 20:46:57 firefly Fledge Storage[3304] INFO: script.plugin.storage.sqlite: SQLite3 readings database is ready.

May 13 20:46:57 firefly Fledge Storage[3378] INFO: script.plugin.storage.sqlite: SQLite 3 database '/usr/local/fledge/data/readings_1.db' ready.

May 13 20:46:57 firefly Fledge Storage[3304] INFO: script.plugin.storage.sqlite: SQLite3 database is ready.

May 13 20:46:56 firefly Fledge Storage[3375] INFO: script.plugin.storage.sqlite: SQLite 3 database '/usr/local/fledge/data/fledge.db' ready.

May 13 20:46:56 firefly Fledge [3350] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 13 20:46:56 firefly Fledge [3337] INFO: scripts.services.storage: Fledge storage microservice found in FLEDGE_ROOT location: /usr/local/fledge

May 13 20:44:43 firefly Fledge [2384] INFO: script.fledge: Fledge started.

I started fledge again and went to support bundle and clicked Request New but fledge crashed again. So I deleted the http_south plugin and started fledge again. This time it successfully generated the support bundle (attached). support-240513-20-50-14.tar.gz

ashish-jabble commented 6 months ago

@stephenrichardson

The problem happens when I install the http_south plugin. When I go to south and then click Add, Fledge crashes.

It has nothing to do with http_south plugin.

This time it successfully generated the support bundle (attached).

I see in your instance you have multiple storage service running that's why you see the crash and with related logs like

May 9 15:24:57 firefly Fledge Storage[23911]: ERROR: Add config category failed Connection refused.

I suspect your fledge stop was not completed in before attempts. See the below ouput for multiple storage services are running...

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND”,
firefly   2580  0.3  0.3 332752 13472 ?        Ssl  20:44   0:01 /usr/local/fledge/services/fledge.services.storage --address=0.0.0.0 --port=34034",
firefly   3497  0.3  0.3 332708 13596 ?        Ssl  20:46   0:00 /usr/local/fledge/services/fledge.services.storage --address=0.0.0.0 --port=38992”,
firefly   4717 16.4  0.9 2700372 38648 pts/2   Sl   20:49   0:03 python3 -m fledge.services.core”,
firefly   4815  1.9  0.3 332708 13668 ?        Ssl  20:49   0:00 /usr/local/fledge/services/fledge.services.storage --address=0.0.0.0 --port=35017”

With the given support bundle only 2 processes should be running one is core and another is storage. Therefore with your last run of fledge only 4717, 4815 PID should exist.

So, to clean your instance either delete 2580, 3497 PID manually or use fledge kill command it will automatically kill all the fledge processes exists in the environment.

As suggested in previous comments https://github.com/fledge-iot/fledge/issues/1353#issuecomment-2103937493