Azure / Industrial-IoT

Azure Industrial IoT Platform
MIT License
521 stars 215 forks source link

Linger does not work when reference continuation token between calls is the same #2227

Closed VladimirMakarevich closed 3 months ago

VladimirMakarevich commented 3 months ago

Describe the bug

We are right moving from the old OPC Twin (2.8.6) to the new Publisher (2.9.4) module. We used direct methods - Browse_V2 and BrowseNext_V2 to find all child nodes. And in old OPC Twin (2.8.6) everything worked correctly. But in the new Publisher (2.9.4) we are faced with the issue that requesting BrowseNext via HTTP or via MQTT returns only one successful response, and with the second request returns an error:

{ "references": [], "errorInfo": { "statusCode": 2152333312, "symbolicId": "BadContinuationPointInvalid" } }

To Reproduce

Steps to reproduce the behavior:

  1. Start OPC Publisher with mqtt transport and configuration api enabled.
  2. Start Kepserver with sufficiently many nodes in a single folder. So that you can request BrowseNext more than 2 times. For example you need at least 41 if the page size is 20. Or you can reduce the maxReferencesToReturn.
  3. Make the first request to BrowseFirst (/v2/browse/first). In the response we see the first 20 nodes and ContinuationToken = `"AA==".

request "/v2/browse/first":

{ "connection": { "endpoint": { "url": "opc.tcp://10.10.10.100:50000", "securityMode": 3, "securityPolicy": "http://opcfoundation.org/UA/SecurityPolicy#None" } }, "request": { "nodeId": "nsu=KEPServerEX;s=SQL.Device1", "direction": 0, "maxReferencesToReturn": 20, "readVariableValues": false } }

  1. Make the first request to BrowseNext (/v2/browse/next) with ContinuationToken. In the response we see the second 20 nodes and ContinuationToken = `"AA==".

request "/v2/browse/next":

{ "connection": { "endpoint": { "url": "opc.tcp://10.10.10.100:50000", "securityMode": 3, "securityPolicy": "http://opcfoundation.org/UA/SecurityPolicy#None" } }, "request": { "continuationToken": "AA==" } }

  1. Second request to BrowserNext (/v2/browse/next) with ContinuationToken returns error - statusCode = 2152333312, symbolicId = BadContinuationPointInvalid.
  2. The same steps to reproduce using MQTT request/response.

Expected behavior

BrowseNext (/v2/browse/next) should return all child nodes by ContinuationToken. Behavior should be the same as with OPC Twin. Using ContinuationToken we can get all child nodes.

Additional context

Publisher settings:

"publisher1": { "settings": { "image": "mcr.microsoft.com/iotedge/opc-publisher:2.9.6", "createOptions": "{\"Hostname\":\"publisher\",\"Cmd\":[\"-c\",\"--pf=/appdata/publisher1.json\",\"--aa\",\"--fullfeaturedmessage=true\",\"--batchsize=10\",\"--batchtriggerinterval=1000\",\"--messagingmode=Samples\",\"--opcpublishinginterval=1000\",\"--opcsamplinginterval=1000\",\"--trustmyself=true\",\"--di=10\",\"--loglevel=___LOG_LEVEL___\",\"--defaultmessagetransport=Mqtt\",\"--mqttclientconnectionstring=HostName=___BROKER_HOST___;Port=___BROKER_PORT___;Username=___BROKER_USERNAME___;Password=___BROKER_PASSWORD___;UseTls=false;Protocol=v5\",\"--api-key=___PUBLISHER_CONFIGURATION_APIKEY___\",\"--immediatepublishing=true\",\"--roottopictemplate=root\",\"--telemetrytopictemplate=root/opc-publisher\",\"--methodtopictemplate=root/opc-publisher/methods\",\"--mdt=root/opc-publisher/metadata\", \"--qs=1000\", \"--removedupsinbatch=true\"],\"HostConfig\":{\"PortBindings\":{\"62222/tcp\":[{\"HostPort\":\"62222\"}]},\"Binds\":[\"/appdata:/appdata\"],\"LogConfig\":{\"Type\":\"json-file\",\"Config\":{\"max-size\":\"___LOG_MAX_SIZE___\",\"max-file\":\"___LOG_MAX_FILE___\"}}}}" }, "type": "docker", "version": "1.0", "status": "running", "restartPolicy": "always", "startupOrder": 100 },

alxy commented 3 months ago

I was able to replicate this as well. Using the very same kepserver, using opctwin and opcpublisher and direct method calls, I am able to browse multiple pages of signals using the twin, but with the publisher I'm only able to do the first request.

marcschier commented 3 months ago

I added a test (https://github.com/Azure/Industrial-IoT/blob/45d88c14b077c78e17bee46e1813b795db56898c/src/Azure.IIoT.OpcUa.Publisher.Module/tests/Mqtt/TestData/BrowseTests.cs#L164) and that works fine. I see the issue if the continuation token is passed again, maybe that is what happens somehow.

marcschier commented 3 months ago

Could you try with 2.9.7?

alxy commented 3 months ago

@marcschier Are you running these tests aginst a real OPCUA server, and if yes, against which implementation? We were not able to replicate this behaviour using the opcplc simulator - they seem to behave differently when it comes to continuation tokens (specifically, they return a new continuation token for each new BrowseNext request). However, with a default Kepserver installation (you can get the trial license which works for 2 hours for free) it doesnt work.

That beeing said, Ill try against 2.9.7 next week.

marcschier commented 3 months ago

Correct, all our tests run against opc-plc as well as the official opc foundation reference servers. Kepserver returns the same token every time? Is that how it behaves?

marcschier commented 3 months ago

@mregen FYI

alxy commented 3 months ago

Yes, thats true. Specifically, I'm always recieving the token AA== for Kepserver. Using the OPC Twin, I can pass this very same token again and again (in fact, the request is exactly the same in that case every time) to the twin, and are able to browse multiple pages of data.

I can provide you with a publicly accessible Kepserver instance to test this against, if needed.

marcschier commented 3 months ago

I think in between calls to next the session was garbage collected. Can you set a large longer timeout using '--cl' option to ensure the session stays live between browsenext calls? If you see the issue, can you check in the log if the client went to state disconnected?

alxy commented 3 months ago

I can confirm setting this value enables me to browse our kepserver again, thanks!

marcschier commented 3 months ago

The publisher has a feature that tracks continuation tokens on an active connection so that the connection is not collected. The issue happens if a new tracked token and the previous token are the same, in which case the new tracked token is added to the ref count list (where it already is) and the same token then later is removed, closing the session. This also has been fixed. --cl is recommended though when using any higher volume service calls to keep the session lingering for fast reuse on other calls.