aenix-io / cozystack

Free and Open Source PaaS-platform for seamless management of virtual machines, managed Kubernetes, and Databases-as-a-Service
https://cozystack.io
Apache License 2.0
911 stars 52 forks source link

linstor SATELLITE BUG #456

Open klinch0 opened 4 weeks ago

klinch0 commented 4 weeks ago

klin@asus:~$ linstor n l

+--------------------------------------------------------+
| Node | NodeType  | Addresses                 | State   |
|========================================================|
| srv1 | SATELLITE | 192.168.100.11:3367 (SSL) | OFFLINE |
| srv2 | SATELLITE | 192.168.100.12:3367 (SSL) | Online  |
| srv3 | SATELLITE | 192.168.100.13:3367 (SSL) | Online  |
+--------------------------------------------------------+

logs on 1st sattelite:

8:28:59.315 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 167.94.138.49:50694: SSL error: no cipher suites in common
00:43:54.056 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 162.142.125.214:56050: SSL error: Empty client certificate chain
00:43:54.616 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 162.142.125.214:56064: SSL error: Empty client certificate chain
00:43:59.337 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 162.142.125.214:56094: SSL error: Unrecognized SSL message, plaintext connection?
00:44:01.115 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 162.142.125.214:43896: SSL error: Empty server certificate chain
00:44:03.240 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 162.142.125.214:43906: SSL error: The client supported protocol versions [TLSv1.1, TLSv1] are not accepted by server preferences [TLS13, TLS12]
00:44:05.025 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 162.142.125.214:43908: SSL error: The client supported protocol versions [TLSv1] are not accepted by server preferences [TLS13, TLS12]
00:44:06.308 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 162.142.125.214:43954: SSL error: Client requested protocol SSLv3 is not enabled or supported in server context
01:15:00.476 [MainWorkerPool-17] INFO  LINSTOR/Satellite - SYSTEM - SpaceInfo: data -> 1443386244/1862270976
01:15:00.477 [MainWorkerPool-17] INFO  LINSTOR/Satellite - SYSTEM - SpaceInfo: DfltDisklessStorPool -> 9223372036854775807/9223372036854775807
01:15:00.477 [MainWorkerPool-17] INFO  LINSTOR/Satellite - SYSTEM - Storage pool data reports capacity 1862270976 kiB, allocated space 1443386244 kiB
01:15:00.477 [MainWorkerPool-17] INFO  LINSTOR/Satellite - SYSTEM - SpaceTracking: Satellite aggregate capacity is 1862270976 kiB, allocated capacity is 418884732 kiB, no errors
08:20:55.539 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 137.184.85.175:40138: SSL error: Unrecognized SSL message, plaintext connection?
08:47:18.691 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 45.227.254.49:65156: SSL error: Unrecognized SSL message, plaintext connection?
09:16:57.162 [MainWorkerPool-18] INFO  LINSTOR/Satellite - SYSTEM - LogArchive: Running log archive on directory: /var/log/linstor-satellite
09:16:57.166 [MainWorkerPool-18] INFO  LINSTOR/Satellite - SYSTEM - LogArchive: No logs to archive.
10:03:52.072 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 137.184.85.175:47146: SSL error: Empty client certificate chain
10:54:46.224 [SSLNetComService] ERROR LINSTOR/Satellite - SYSTEM - Peer 162.142.125.214:44188: SSL error: Empty client certificate chain

after restart 1st sattelite:

klin@asus:~$ linstor n l
+-------------------------------------------------------+
| Node | NodeType  | Addresses                 | State  |
|=======================================================|
| srv1 | SATELLITE | 192.168.100.11:3367 (SSL) | Online |
| srv2 | SATELLITE | 192.168.100.12:3367 (SSL) | Online |
| srv3 | SATELLITE | 192.168.100.13:3367 (SSL) | Online |
+-------------------------------------------------------+

need healthcheck or something anything

dosubot[bot] commented 4 weeks ago

Hey @klinch0, I'm here to help you with that linstor satellite bug. I'm a bot designed to assist you with bug fixes, answer your questions, and help you become a contributor. Let's tackle this together!

It seems like the issue with the LINSTOR Satellite going offline was related to SSL errors, specifically "no cipher suites in common" and "Empty client certificate chain." Restarting the satellite resolved the issue temporarily, but implementing a health check mechanism could help detect and address such issues automatically in the future.

Unfortunately, I couldn't find specific information about a built-in health check feature in the LINSTOR Satellite codebase. However, you might consider setting up an external monitoring tool to periodically check the status of your nodes and alert you if any go offline. Additionally, reviewing your SSL configuration to ensure compatibility between nodes could help prevent these errors from occurring.

To continue talking to Dosu, mention @dosu.