Open FR-ADDIX opened 1 year ago
Probably just an old trace line I overlooked before PR. Is anything failing?
Actually, everything is running quite smoothly. But every now and then we have a crash with error 139 When analyzing how this occurs, we noticed the warning.
Last State: Terminated Reason: Error Exit Code: 139 Started: Tue, 20 Jun 2023 04:54:29 +0200 Finished: Tue, 20 Jun 2023 07:27:46 +0200
exit code 139 ... Any idea where that error code comes from? I don't think the broker itself ever exits with a 139. Perhaps a docker thing (might be the broker allocates too much RAM - heard something about that) ?
No, it is actually very frugal with RAM and CPU at the moment. We live in a K8s environment and allow 2 CPUs and 16GB RAM. The POD can get this in 50m CPU and 128MB blocks. But of course it can be that just at the time on the cluster not sufficient resources are available because a completely different independent process has fetched the resources. We will observe this, the tip may have been helpful.
ok, I know from performance tests that kubernetes kills the broker if it allocates too much memory. Normally this problem would be taken care of by the swapping of the OS, but, unfortunately kubernetes doesn't support swapping (someone from RedHat told me that this was going to be solved - kubernetes supporting swapping).
Quick search on "kubernetes error 139" gave me this:
Exit Code 139 means that the container received a SIGSEGV signal from the operating system. This indicates a segmentation error – a memory violation, caused by a container trying to access a memory location to which it does not have access.
Might be the broker crashes for you. If that is the case, I'd be really interested in getting more info on that. For example, could you start the broker inside valgrind? [ Valgrind would tell us exactly where (well, more or less) the problem lies - broker inside gdb would also work ]
I have now lowered the RAM block requirement from 128mi to 64mi and have had no failures for several hours now. We will continue to monitor this over the weekend and report back on Monday.
What does the Orion-LD V 1.2.0 want to tell me with the following message?
time=Wednesday 21 Jun 10:06:50 2023.306Z | lvl=WARN | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=subCacheAlterationMatch.cpp[148]:matchLookup | msg=Different entity (urn:ngsi-ld:AirQualityObserved:Umweltbundesamt:DESH056 vs urn:ngsi-ld:AirQualityObserved:Umweltbundesamt:DESH027) - need to add it to the notification for sub urn:ngsi-ld:subscription:c4ac8dc8-0ed5-11ee-9265-be65a175e8ce
There are currently 4 entities:
[ { "id": "urn:ngsi-ld:AirQualityObserved:Umweltbundesamt:DESH027", "type": "AirQualityObserved", "name": { "type": "Property", "value": "Kiel-Bahnhofstr. Verk. Umweltbundesamt DESH027" } }, { "id": "urn:ngsi-ld:AirQualityObserved:Umweltbundesamt:DESH052", "type": "AirQualityObserved", "name": { "type": "Property", "value": "Kiel Theodor-Heuss-Ring Verk. Umweltbundesamt DESH052" } }, { "id": "urn:ngsi-ld:AirQualityObserved:Umweltbundesamt:DESH056", "type": "AirQualityObserved", "name": { "type": "Property", "value": "Eggebek Umweltbundesamt DESH056" } }, { "id": "urn:ngsi-ld:AirQualityObserved:Umweltbundesamt:DESH057", "type": "AirQualityObserved", "name": { "type": "Property", "value": "Kiel Bremerskamp Verk. Umweltbundesamt DESH057" } } ]
This is the subscription that is addressed in the warning:
{ "id": "urn:ngsi-ld:subscription:c4ac8dc8-0ed5-11ee-9265-be65a175e8ce", "type": "Subscription", "subscriptionName": "QL:AirQualityObserved", "description": "Historisierung der AirQualityObserved Daten", "entities": [ { "idPattern": "AirQualityObserved", "type": "AirQualityObserved" } ], "watchedAttributes": [ "dateObserved" ], "status": "active", "isActive": true, "notification": { "format": "normalized", "endpoint": { "uri": "http://quantumleap8.fiware-staging.svc:8668/v2/notify", "accept": "application/json", "receiverInfo": [ { "key": "Fiware-Service", "value": "infoportal" } ] }, "status": "ok", "timesSent": 19363, "lastNotification": "2023-06-21T10:26:57.486Z", "lastFailure": "2023-06-21T09:56:50.284Z", "lastSuccess": "2023-06-21T10:26:57.486Z" }, "origin": "cache" }
is there something wrong?