Orion-LD crashes in case of multiple requests at "ngsi-ld/v1/entityOperations/upsert" endpoint.

max-thoma commented 11 months ago

I have run into problems when two or more requests reach the ngsi-ld/v1/entityOperations/upsert endpoint at the same time. Orion-LD crashes without any error messages.

This is my docker compose setup:

  orion:
    image: fiware/orion-ld
    hostname: orion
    container_name: fiware-orion
    depends_on:
      - mongo-db
    networks:
      - default
    ports:
      - "1026:1026"
    command: -dbhost mongo-db -logLevel DEBUG -forwarding
    environment:
      - ORIONLD_LOG_FOR_HUMANS=TRUE
      - ORIONLD_RELOG_ALARMS=TRUE
    healthcheck:
      test: curl --fail -s http://orion:1026/version || exit 1

  mongo-db:
    image: mongo:4.2
    hostname: mongo-db
    container_name: db-mongo
    expose:
      - "27017"
    ports:
      - "27017:27017"
    networks:
      - default
    command: --nojournal

  ld-context:
    image: httpd:alpine
    hostname: context
    container_name: fiware-ld-context
    ports:
      - "3004:80"
    volumes:
      - data-models:/usr/local/apache2/htdocs/

Orion-LD Versions:

orionld version: post-v1.4.0,
orion version: 1.15.0-next,

This Python script reproduces the bug:

import threading
import requests

def update_broker(payload):
    url = "http://localhost:1026/ngsi-ld/v1/entityOperations/upsert?options=update"

    headers = {
        'Content-Type': 'application/ld+json',
    }
    try:
        requests.post(url, headers=headers, data=payload)
        print("OK")
    except:
        print("Error")
        exit(1)

def bug():
    update_broker('[{"id": "urn:smart-meter-1", "type": "SmartMeter", "@context": [{"activePower": '
                  '"https://schema.org/Thing"}, {"@language": "en"}, '
                  '"https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context.jsonld"], '
                  '"activePower": {"type": "Property", "value": 5, '
                  '"observedAt": "2023-10-25T09:30:57.917Z", "unitCode": "P1"}}]')

if __name__ == "__main__":
    for i in range(1, 1000):
        t1 = threading.Thread(target=bug)
        t2 = threading.Thread(target=bug)

        t1.start()
        t2.start()

        t1.join()
        t2.join()

What I could gather so far is the following:

I could not reproduce this issues on other endpoints
The issues occurs more frequently when the payload is large
The issue occurs more frequently if the URL parameter ?options=update is used
The content of the payload does not seem to have any effect

If any other information is needed, I am happy to provide more details!

kzangeli commented 11 months ago

Finally got some time to look into this. Bug reproduced, it's a memory corruption. Those are particularly difficult to fix, but, I'm on it now, so, there's hope ... :)

Thank you very much for reporting!!!

kzangeli commented 11 months ago

Just wanted to propose a "workaround" for this crash you've found. In general, you should avoid to use "application/ld+json" for incoming requests. It puts an important extra load on the broker as it must parse the @context, create the hash tables, etc to be able to use the @context for expansion/compaction of the JSON-LD terms. Instead, host the @context somewhere (Orion-LD implements this service) and send the @context in the Link header instead. That's how you should always pass contexts, and I'm almost sure your crash will go away.

I'm not saying I'm not gonna fix the crash, of course not, I'm on it. But, sending lots and lots of copies of the same contexts to the broker "inline" (application/ld+json) is a very bad idea. All contexts are cached in RAM, and in the end, the broker will run out of RAM and crash for that reason instead (once I fix the bug).

max-thoma commented 11 months ago

Thank you for this insight! For my specific application, it will unfortunately not be possible to completely eliminate application/ld+json incoming requests, because I need the functionality of the Context Broker to parse different @contexts. However, as a workaround, this is a great suggestion.

FIWARE / context.Orion-LD

Orion-LD crashes in case of multiple requests at "ngsi-ld/v1/entityOperations/upsert" endpoint. #1459