FIWARE / context.Orion-LD

Context Broker and CEF building block for context data management which supports both the NGSI-LD and the NGSI-v2 APIs
https://www.etsi.org/deliver/etsi_gs/CIM/001_099/009/01.06.01_60/gs_CIM009v010601p.pdf
GNU Affero General Public License v3.0
50 stars 43 forks source link

What is the maximum throughput of orion-ld notifications? #921

Open damianhorna opened 3 years ago

damianhorna commented 3 years ago

Hi,

I run orion-ld with docker-compose like this:

version: "3.9"
services:
  mongo:
    image: mongo:3.4
    command: --nojournal
  orion:
      image: fiware/orion-ld
      links:
        - mongo
      ports:
        - "1026:1026"
      command: -dbhost mongo

And then I create entities, subscriptions and start generating data. The load is pretty heavy (like several notifications every 0.02s in multiple processes), and the orion-ld crashes after some nondeterministic amount of time with the code 139:

orion_1                 | time=Thursday 19 Aug 12:48:03 2021.017Z | lvl=TMP | corr=N/A | trans=1629376481-309-00000015260 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=httpRequestSend.cpp[547]:httpRequestSendWithCurl | msg=Sending message 15204 to HTTP server: sending message of 691 bytes to HTTP server
orion_1                 | time=Thursday 19 Aug 12:48:03 2021.121Z | lvl=TMP | corr=N/A | trans=1629376481-309-00000015261 | from=pending | srv=pending | subsrv=pending | comp=Orion | op=httpRequestSend.cpp[547]:httpRequestSendWithCurl | msg=Sending message 15205 to HTTP server: sending message of 691 bytes to HTTP server
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn11] end connection 172.18.0.8:38520 (11 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn7] end connection 172.18.0.8:38512 (11 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn9] end connection 172.18.0.8:38516 (11 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn3] end connection 172.18.0.8:38504 (11 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn5] end connection 172.18.0.8:38508 (8 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn6] end connection 172.18.0.8:38510 (11 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn10] end connection 172.18.0.8:38518 (11 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn8] end connection 172.18.0.8:38514 (6 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn1] end connection 172.18.0.8:38500 (3 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn2] end connection 172.18.0.8:38502 (3 connections now open)
mongo_1                 | 2021-08-19T12:48:03.330+0000 I -        [conn4] end connection 172.18.0.8:38506 (1 connection now open)
pmadai_orion_1 exited with code 139

The problem also happens if I decrease the load to let's say several notifications every 0.1 s (within multiple processes), but after a longer period of time.

It seems like the issue is mongo-related. This is strange though, because I didn't have similar issues when using the original fiware/orion.

Any advice on how to solve this problem would be much appreciated!

wistefan commented 3 years ago

This crash(139) is usually due to resource shortage, e.g. memory. For numbers on notifications, you can have a look on our loadtesting repository: https://github.com/FIWARE/load-tests/ We publish numbers for two different notification scenarios, you can find them for example here: https://fiware.github.io/load-tests/testReports/orion-ld/tiny/reports/ld/EntityUpdateWithSubscriptionSimulation/gatling-report.html

damianhorna commented 3 years ago

Thanks @wistefan for your response!

I actually had some time today to monitor the usage of different resources with docker stats for various containers.

I do not limit the RAM or CPU usage in the docker settings, so theoretically all host's resources are available for the containers to consume.

From what I understand, the usage of memory is pretty consistent no matter the load (for the conditions which I tested) and it is around 0.11% for mongo container and 0.03% for orion-ld (I have a 32GB of RAM on my machine). The only thing which I noticed is the change of CPU usage depending on load.

The bigger the load, the higher the CPU usage of orion-ld and mongo (obviously), and the quicker the app crashes. However, it still crashes even if the load is relatively low (like 1% of CPU for both mongo and orion), but only after a longer period of time.

When I simulate load in the 0.02s intervals, the app crashes after 1-3 mins When I simulate load in the 0.1s or 0.5s intervals, the app crashes after longer intervals - from 5 to 30 mins, hard to predict.

What is the best way to find out what is the exact cause of the crash (exit code 139)?

As mentioned, this didn't happen for the regular fiware/orion.

Thanks!

kzangeli commented 3 years ago

If the crash is because of the broker itself and not the container, try starting the broker inside gdb. However, it kind of seems like it's a "container related" crash.