FIWARE / context.Orion-LD

Context Broker and CEF building block for context data management which supports both the NGSI-LD and the NGSI-v2 APIs
https://www.etsi.org/deliver/etsi_gs/CIM/001_099/009/01.06.01_60/gs_CIM009v010601p.pdf
GNU Affero General Public License v3.0
51 stars 43 forks source link

Orion-LD crashes in case of multiple requests at "ngsi-ld/v1/entityOperations/upsert" endpoint. #1459

Open max-thoma opened 1 year ago

max-thoma commented 1 year ago

I have run into problems when two or more requests reach the ngsi-ld/v1/entityOperations/upsert endpoint at the same time. Orion-LD crashes without any error messages.

This is my docker compose setup:

  orion:
    image: fiware/orion-ld
    hostname: orion
    container_name: fiware-orion
    depends_on:
      - mongo-db
    networks:
      - default
    ports:
      - "1026:1026"
    command: -dbhost mongo-db -logLevel DEBUG -forwarding
    environment:
      - ORIONLD_LOG_FOR_HUMANS=TRUE
      - ORIONLD_RELOG_ALARMS=TRUE
    healthcheck:
      test: curl --fail -s http://orion:1026/version || exit 1

  mongo-db:
    image: mongo:4.2
    hostname: mongo-db
    container_name: db-mongo
    expose:
      - "27017"
    ports:
      - "27017:27017"
    networks:
      - default
    command: --nojournal

  ld-context:
    image: httpd:alpine
    hostname: context
    container_name: fiware-ld-context
    ports:
      - "3004:80"
    volumes:
      - data-models:/usr/local/apache2/htdocs/

Orion-LD Versions:

orionld version: post-v1.4.0,
orion version: 1.15.0-next,

This Python script reproduces the bug:

import threading
import requests

def update_broker(payload):
    url = "http://localhost:1026/ngsi-ld/v1/entityOperations/upsert?options=update"

    headers = {
        'Content-Type': 'application/ld+json',
    }
    try:
        requests.post(url, headers=headers, data=payload)
        print("OK")
    except:
        print("Error")
        exit(1)

def bug():
    update_broker('[{"id": "urn:smart-meter-1", "type": "SmartMeter", "@context": [{"activePower": '
                  '"https://schema.org/Thing"}, {"@language": "en"}, '
                  '"https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context.jsonld"], '
                  '"activePower": {"type": "Property", "value": 5, '
                  '"observedAt": "2023-10-25T09:30:57.917Z", "unitCode": "P1"}}]')

if __name__ == "__main__":
    for i in range(1, 1000):
        t1 = threading.Thread(target=bug)
        t2 = threading.Thread(target=bug)

        t1.start()
        t2.start()

        t1.join()
        t2.join()

What I could gather so far is the following:

If any other information is needed, I am happy to provide more details!

kzangeli commented 1 year ago

Finally got some time to look into this. Bug reproduced, it's a memory corruption. Those are particularly difficult to fix, but, I'm on it now, so, there's hope ... :)

Thank you very much for reporting!!!

kzangeli commented 1 year ago

Just wanted to propose a "workaround" for this crash you've found. In general, you should avoid to use "application/ld+json" for incoming requests. It puts an important extra load on the broker as it must parse the @context, create the hash tables, etc to be able to use the @context for expansion/compaction of the JSON-LD terms. Instead, host the @context somewhere (Orion-LD implements this service) and send the @context in the Link header instead. That's how you should always pass contexts, and I'm almost sure your crash will go away.

I'm not saying I'm not gonna fix the crash, of course not, I'm on it. But, sending lots and lots of copies of the same contexts to the broker "inline" (application/ld+json) is a very bad idea. All contexts are cached in RAM, and in the end, the broker will run out of RAM and crash for that reason instead (once I fix the bug).

max-thoma commented 1 year ago

Thank you for this insight! For my specific application, it will unfortunately not be possible to completely eliminate application/ld+json incoming requests, because I need the functionality of the Context Broker to parse different @contexts. However, as a workaround, this is a great suggestion.