FIWARE / context.Orion-LD

Context Broker and CEF building block for context data management which supports both the NGSI-LD and the NGSI-v2 APIs
https://www.etsi.org/deliver/etsi_gs/CIM/001_099/009/01.06.01_60/gs_CIM009v010601p.pdf
GNU Affero General Public License v3.0
50 stars 42 forks source link

Fiware - data loss with orion-ld and a stress test #1650

Open truijllo opened 1 month ago

truijllo commented 1 month ago

During a series of tests we realised that we have a problem with notifications generated by subscriptions. In particular, not all changes correctly received by orion and made to the entities monitored by the subscriptions generate notifications to the downstream systems. To identify the problem, the test environment was reduced to

During the test, 100 entities are created, a single subscription for all of them, and 5 updates with an interval of 1s. The test is run for both LD and V2. In both cases, a significant number of notifications are not sent to the fake server.

Analysing the logs with debug and trace enabled, we found that notifications are not sent when this condition occurs:

[1318]:addTriggeredSubscriptions_withCache | msg=Subscription:         66964843ed9b9fa7aa4d3bc8
[1319]:addTriggeredSubscriptions_withCache | msg=NOW:                  1721124935.260880
[1320]:addTriggeredSubscriptions_withCache | msg=lastNotificationTime: 1721124935.265951
[1321]:addTriggeredSubscriptions_withCache | msg=DIFF:                 -0.005070
[1322]:addTriggeredSubscriptions_withCache | msg=throttling:           0.000000
[1323]:addTriggeredSubscriptions_withCache | msg=lastSuccess:          1721124935.265951
[1324]:addTriggeredSubscriptions_withCache | msg=lastFailure:          0.000000
[1329]:addTriggeredSubscriptions_withCache | msg=No notification due to throttling (last: 1721124935.260880 vs now: 1721124935.265951)

throttling is set to 0.

By enabling the -experimental flag on the same configuration, no notifications are lost on the LD side, but obviously the problem persists on the V2 side, which does not benefit from the flag changes. Another test was done using fiware/orion instead of orion-ld and the V2 side showed no problems.

The issue appears even with relatively small numbers, such as 20 records submitted and 14 notified to the fake server ( this means 6 losts ), which makes us think there is something wrong with us that we cannot identify.

We need both parts, the V2 and the LD, this issue is pushing us to split services with the two different services, kind of an overkill solution.

During the test, 100 entities are created, a single subscription for all of them, and 5 updates with an interval of 1s.

the fake http server should receive all the changes.

Pushing a json file using the Apache benchmarking tool ( "ab" ) , using a sequential approach everything works fine. Using concurrent ingestion ( i.e. 20 input with 10 requests in a time ) I observe only ( more or less ) 14 records notified to the fake server. Other tests were done with other tools ( even with https://github.com/FIWARE/load-tests )

Does anyone know what the problem could be or how to fix it?

thinkingmik commented 1 month ago

I have the same problem. I'm using orion-ld:1.6.0 and mongodb:4.4.

In my case there is an IoT Agent UL that creates/updates entities on orion-ld context. It send about 70 calls in once to orion. The orion-ld context is updated successfully and I see all the 70 entities created/updated, but the related entity type subscription sends to an external service not all the notification. For example sometimes I've got 57 subscriptions sent, 37, 48, etc... (it's random).

kzangeli commented 1 week ago

Try setting throttling to -1. In NGSIv2 that means it is ignored. In NGSI-LD throttling is ignored if it is zero or less. Sorry about the inconsistency here between the two APIs

truijllo commented 6 days ago

it seems to work, I used -1 in throttling in both API and, in a testing environment, I get as much entries as I push into. Thanks a lot !