eclipse-basyx / basyx-databridge

Eclipse Public License 2.0
9 stars 13 forks source link

DataBridge OPC UA Subscription Issues #269

Open aaronzi opened 7 months ago

aaronzi commented 7 months ago

I did some more intensive tests with the DataBridge for OPC UA. I encountered an issue where the DataBridge lost its subscription to configured nodes after a few minutes. This is what is displayed in the logs from the DataBridge:

2023-11-25 10:38:22 [milo-shared-thread-pool-2] INFO org.eclipse.digitaltwin.basyx.databridge.aas.AASEndpoint - Proxy URL: http://host.docker.internal:9081/submodels/aHR0cDovL29wYzJhYXMuY29tL3R5cGUvMS8xLzcxYzg3ZjQwLWVlOWEtNDVlZS04MTFkLTIxM2Y3NzI2MDFjNg==/submodel-elements/DynamicObject.DynamicFloat
2023-11-25 10:38:22 [milo-shared-thread-pool-2] INFO org.eclipse.digitaltwin.basyx.databridge.aas.AASEndpoint - Transferred message=79.24521249449533
2023-11-25 10:38:22 [milo-shared-thread-pool-2] INFO route6 - Exchange[ExchangePattern: InOnly, BodyType: String, Body: 79.24521249449533]
2023-11-25 10:38:22 [milo-shared-thread-pool-2] INFO org.apache.camel.component.milo.client.internal.SubscriptionManager - Subscription status changed 79 : StatusCode{name=Bad_Timeout, value=0x800A0000, quality=bad}

As you can see in the last line it says that the subscription status changed. This issue is consistent over different OPC UA servers I was using. I tested this against servers and different PLCs (Wago, Schneider Electric, Siemens S7 1500) and an asyncua python OPC server.

For testing purposes, I have a demo OPC UA server. You can use it like this in docker-compose:

opcua:
    image: aaronzi/demo-opc-server:v1.0.0
    container_name: opcserver
    ports:
      - "4840:4840"  # OPC UA server port
      - "8080:8080"  # Health check port
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 3
    restart: always

The server has a username and password configured. USERNAME="test"; PASWORD="test" You find a value that changes every 100ms under this nodeId: ns=2;i=10

Hope this helps reproduce the issue. The subscription ends around 4mins after the DataBridge started. The time it takes for the subscription to fail decreases with the number of nodes that are simultaneously read (with a high frequency).

FrankSchnicke commented 7 months ago

Thanks a lot for pointing this out. We will take a look and come back with an update.

mateusmolina-iese commented 4 months ago

Hi @aaronzi,

I couldn't replicate the issue with the latest version of basyx-databridge. Could you please check if the issue is still present on your side?

Please find below two stress tests I created for analyzing the issue:

Stress test 1

Stress test 2

aaronzi commented 4 months ago

Hi @mateusmolina-iese,

I will retest this on my end. I hope the issue is solved by itself (maybe through a camel update). I will do a 30-minute run just to be sure. Thank you for providing the two stress tests. I will give you an answer by the end of the day.

aaronzi commented 4 months ago

I'm still encountering the same issue. I mapped every node of the OPC Server in my stress test. I also used the Docker Container (newest image version) of the DataBridge. Here is my DataBridge config and my AAS I tested it with:

aasserver.json opcuaconsumer.json routes.json opc2aas_demo.json

mateusmolina-iese commented 4 months ago

Hi @aaronzi,

thanks for testing the issue again. I'll try to reproduce the issue with your files and get back to you.

mateusmolina-iese commented 4 months ago

Hi @aaronzi,

I built another stress test [^1] with your latest configuration files. It asserts every specified interval if the received value of the dynamic nodes doesn't match the last received ones; a condition assumed not true when the route fails.

It seems like from your 7 routes, only one (DynamicFloat node) was being dynamically updated, the others remained static throughout the test time. That's why I made assertions only based on the dynamic one.

Unfortunately, I still couldn't reproduce the issue when running it twice for 6 minutes each.

Could you please provide more details about how you are getting the error or other methods to reproduce the issue?

[^1]: Test entrypoint, Branch for Stress Test 3

aaronzi commented 4 months ago

Hi @mateusmolina-iese, thank you for revisiting this issue and creating another stress test.

Would it be possible to shedule a Teams meeting? You can write me to this mail for further communication: aaron@zielstorff.com

mateusmolina-iese commented 4 months ago

Hi @aaronzi,

of course. I'll contact you 👍

mateusmolina-iese commented 3 months ago

Just for documentation purposes: @aaronzi and I are discussing a way to better reproduce the issue he is having. I came up with a new test scenario, which tests the behavior for non-dynamic subscriptions:

However, I still couldn't reproduce the issue on my side.