FIWARE / context.Orion-LD

Context Broker and CEF building block for context data management which supports both the NGSI-LD and the NGSI-v2 APIs
https://www.etsi.org/deliver/etsi_gs/CIM/001_099/009/01.06.01_60/gs_CIM009v010601p.pdf
GNU Affero General Public License v3.0
50 stars 43 forks source link

Subscription crashes broker when using HTTPS URIs #1615

Closed domvanrob closed 4 months ago

domvanrob commented 4 months ago

The Issue

The context broker completely crashes when a subscription exists which has an HTTPS notification URI. Using an HTTP notification URI, the subscription seems to work as expected.

Environment

Locally using Docker (see composer below), but it also crashes my cloud hosted container.

Orion-LD: v1.5.1, but I've also tried the latest v1.6.0-pre-1609 image MongoDB: v6.0

services:
  orionld:
    image: quay.io/fiware/orion-ld:${ORION_LD_VERSION}
    container_name: orionld
    hostname: ${ORION_LD_HOSTNAME}
    depends_on:
      mongo-db:
        condition: service_healthy
    expose:
      - "${ORION_LD_PORT}"
    ports:
      - "${ORION_LD_PORT}:1026"
    command: -dbhost ${MONGO_DB_HOSTNAME} -logLevel DEBUG -t 0-5,20-22,31,41-47 -mongocOnly
    healthcheck:
      test: curl --fail -s http://${ORION_LD_HOSTNAME}:${ORION_LD_PORT}/version || exit 1

  mongo-db:
    image: mongo:${MONGO_DB_VERSION}
    container_name: mongo-db
    hostname: ${MONGO_DB_HOSTNAME}
    expose:
      - "${MONGO_DB_PORT}"
    ports:
      - "${MONGO_DB_PORT_HOST}:${MONGO_DB_PORT}"
    command: --bind_ip_all
    volumes:
      - mongo-db:/data
    healthcheck:
      test: |
        host=`hostname --ip-address || echo '127.0.0.1'`;
        mongo --quiet $host/test --eval 'quit(db.runCommand({ ping: 1 }).ok ? 0 : 2)' && echo 0 || echo 1

Replication and logs

Create subscription:

curl --location 'localhost:1026/ngsi-ld/v1/subscriptions' \
--header 'Content-Type: application/ld+json' \
--header 'Accept: application/ld+json' \
--header 'NGSILD-Tenant: X' \
--data-raw '{
  "description": "Notify me when name changes",
  "type": "Subscription",
  "entities": [{"type": "WasteContainer"}],
  "watchedAttributes": ["name"],
  "notification": {
    "attributes": ["name"],
    "format": "keyValues",
    "endpoint": {
      "uri": "{HTTPS_URL}"
    }
  },
   "@context": "https://raw.githubusercontent.com/smart-data-models/dataModel.WasteManagement/master/context.jsonld"
}'

Result: 201 Created

Retrieve subscription

curl --location 'localhost:1026/ngsi-ld/v1/subscriptions' \
--header 'Content-Type: application/ld+json' \
--header 'Link: <https://raw.githubusercontent.com/smart-data-models/dataModel.WasteManagement/master/context.jsonld>; rel="http://www.w3.org/ns/json-ld#context"; type="application/ld+json"' \
--header 'NGSILD-Tenant: X'

Result:

[
  {
    "id": "urn:ngsi-ld:subscription:2508e0ee-1dae-11ef-bab0-0242ac1a0002",
    "type": "Subscription",
    "description": "Notify me when name changes",
    "entities": [{ "type": "WasteContainer" }],
    "watchedAttributes": ["name"],
    "status": "active",
    "isActive": true,
    "notification": {
      "attributes": ["name"],
      "format": "keyValues",
      "endpoint": {
        "uri": "{HTTPS_URL}",
        "accept": "application/json"
      },
      "status": "ok"
    },
    "origin": "cache",
    "jsonldContext": "https://raw.githubusercontent.com/smart-data-models/dataModel.WasteManagement/master/context.jsonld"
  }
]

Update entity

curl --location --request PATCH 'localhost:1026/ngsi-ld/v1/entities/urn:ngsi-ld:WasteContainer:58/attrs/name' \
--header 'Content-Type: application/json' \
--header 'Link: <https://raw.githubusercontent.com/smart-data-models/dataModel.WasteManagement/master/context.jsonld>; rel="http://www.w3.org/ns/json-ld#context"; type="application/ld+json"' \
--header 'NGSILD-Tenant: X' \
--data '{
    "value": "test",
    "type": "Property"
}'

Result: Crashes container

Logs

I've tried multiple startup commands to increase the amount of insight in the logs. As you can see in the composer snippet, I'm currently using -logLevel DEBUG -t 0-5,20-22,31,41-47 which yields the most info. Although, I haven't seen an actual ERROR/FATAL log which explains the crash. The container hosted in the cloud crashes with Uncaught signal: 11, pid=1, tid=44, fault_addr=0..

time=2024-05-28T13:25:46.452Z | lvl=TMP | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=mhdConnectionInit.cpp[1114]:mhdConnectionInit | msg=------------------------- Servicing NGSI-LD request 002: PATCH /ngsi-ld/v1/entities/urn:ngsi-ld:WasteContainer:58 --------------------------
time=2024-05-28T13:25:46.471Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=mhdReply.cpp[77]:mhdReply | msg=Response Body: 'None'
time=2024-05-28T13:25:46.471Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=mhdReply.cpp[78]:mhdReply | msg=Response Code:  204
time=2024-05-28T13:25:46.474Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=httpsNotify.cpp[127]:httpsNotify | msg=urn:ngsi-ld:subscription:473d794a-1cf5-11ef-97ff-0242ac1a0002: Protocol for HTTPS notification: https (2)
time=2024-05-28T13:25:46.474Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=httpsNotify.cpp[128]:httpsNotify | msg=urn:ngsi-ld:subscription:473d794a-1cf5-11ef-97ff-0242ac1a0002: IP for HTTPS notification: {HTTPS_URL}
time=2024-05-28T13:25:46.474Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=httpsNotify.cpp[129]:httpsNotify | msg=urn:ngsi-ld:subscription:473d794a-1cf5-11ef-97ff-0242ac1a0002: Port for HTTPS notification: 443
time=2024-05-28T13:25:46.474Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=httpsNotify.cpp[130]:httpsNotify | msg=urn:ngsi-ld:subscription:473d794a-1cf5-11ef-97ff-0242ac1a0002: Rest for HTTPS notification: (null)

_I've replaced the actual URI with {HTTPSURL}

If any additional information is required, please ask.

Thanks in advance!

kzangeli commented 4 months ago

ok, noted. I'm having a strong case of "veja-vu" here ... I believe there's a very similar issue in the list of fixed issues (that's why the tag is "possible bug" and not directly "bug" for now ;)). Let my search/browse a little ...

domvanrob commented 4 months ago

Think I know which one you mean: https://github.com/FIWARE/context.Orion-LD/issues/1495

kzangeli commented 4 months ago

Nah, what I found is similar but not similar enough.

Rest for HTTPS notification: (null)

Might be just a log message using that NULL pointer. That would cause a SIGSEGV (11). I'll try to find some time to look into this asap

kzangeli commented 4 months ago

Think I know which one you mean: https://github.com/FIWARE/context.Orion-LD/issues/1495

Yes, that's the one I was thinking of

domvanrob commented 4 months ago

Alright, hopefully you'll find something!

kzangeli commented 4 months ago

I did find something, and it is related to the empty URL PATH of the notification endpoint. Bug fixed and tests running. If all goes well, and I have no reason to doubt that, a fix PR should be merged in a few hours.

kzangeli commented 4 months ago

So, hopefully fixed the issue, please try again with the newest image on dockerhub and close this issue if all is OK. (and if not, please let me know and I'll keep looking)

domvanrob commented 4 months ago

It's working!! Thanks for the fast reply, and fix.

What was the issue, if I may ask? Tried to understand the changes, but not sure why this was causing the crash.

kzangeli commented 4 months ago

Yeah, it was a stupidity. Not enough QA :( The crash occurs when the URL PATH of the notification endpoint is empty, as in your case. This worked just fine:

  "notification": {
    "endpoint": {
      "uri": "https://IP:PORT/urlpath"
...

While this provokes the crash:

  "notification": {
    "endpoint": {
      "uri": "https://IP:PORT/"     # empty URL PATH
...

The piece pf code getting Segmentation Fault was this:

if (rest[0] == '/')
    rest = &rest[1];

rest is the variable (a char pointer) that references the URL PATH, in your case as there is no URL PATH, rest is a NULL pointer. Referencing (looking inside) rest in if (rest[0] == '/') as rest == 0 (NULL), is an access to memory address ZERO in the system and that is not allowed. It causes the crash due to Segmentation Fault.

Very easy fix: if rest is NULL, don't do it.

domvanrob commented 4 months ago

That sounds logical, and already what I expected after viewing the changes. Funny enough, that I hadn't tested a variant with a path, in which case I could have probably made a more direct case in the issue :D

Thanks for the explanation, and keep up the good work!