eclipse / milo

Eclipse Milo™ - an open source implementation of OPC UA (IEC 62541).
http://www.eclipse.org/milo
Eclipse Public License 2.0
1.13k stars 423 forks source link

Error decoding ExtensionObject from specific Class #570

Closed WimVriend closed 4 years ago

WimVriend commented 4 years ago

Hello Kevin,

We are very busy testing the performance of the S7-1500 OPC-UA servers embedded in our PLC's and we ran into a strange problem: ExtensionObjects cannot be decoded (so an error is thrown). The weird thing is: decoding only fails on 1 particular class on 1 PC.

Let me try to explain the situation. I have my development PC on which I develop Java (1.8.0_172) using Eclipse. I have compiled the 0.3.3. version of Milo with Maven and included the required jars in the project. In Eclipse I have two classes in my Run Configuration which use the same PLC-driver class. In this class the subscriptions are created and the changeNotifications handled. It all works fine for both programs: all STRUCTS in the PLC are decoded.

My colleague has his own PC with Eclipse and he is working on the same trunk. On his PC, one of the classes which depend on the PLC-driver works fine. The other does not: all decoding fails. Of course we have tried to spot any differences between the PC's, but so far without success.

Maybe you can help us with some ideas? Is there a way we can force the DataTypeManager to renew the decoding code? Does Milo log any messages or do we have to recompile it for that? Any help is appreciated.

kevinherron commented 4 years ago

Are the two of you accessing the same PLC or different PLCs? Are there any warnings in the logs? What's the decoding error you're seeing?

Turning the logging level for DataTypeDictionaryReader to TRACE might help figure out what's going on.

WimVriend commented 4 years ago

Hello Kevin,

Sorry I forgot to mention: both PC's connect to the same PLC. So all DataTypes are present and can be decoded by 3 out of 4 classes...

The exact error-message is:

no codec registered for encodingId=NodeId{ns=3, id=TE_"DIGOUT_STATUS"}

The dataValue.toString returns:

DataValue{value=Variant{value=ExtensionObject{encoded=ByteString{bytes=[0, 0, 0, 0, 0, 0, 0, 0]}, encodingId=NodeId{ns=3, id=TE_"DIGOUT_STATUS"}}}, status=StatusCode{name=Good_Overload, value=0x002F0000, quality=good}, sourceTime=DateTime{utcTime=132181948856682524, javaDate=Thu Nov 14 08:48:05 GMT 2019}, serverTime=DateTime{utcTime=132181948856682524, javaDate=Thu Nov 14 08:48:05 GMT 2019}}

Which log-files do you mean? Is it possible to enable logging to the console using VM-arguments?

kevinherron commented 4 years ago

Milo logs via SLF4J, which delegates to whatever logging framework your application uses. There should be a way to configure your logging framework to log to the console as well as specify logging levels for specific loggers.

I need to see an example of the code that doesn't work as well as the error you get from that code.

I don't have enough information right now to troubleshoot, especially since what you describe sounds like an issue with your coworkers environment somehow and not a bug, since you say the code and PLC are identical between the two of you.

WimVriend commented 4 years ago

Hello Kevin,

I agree: it looks like the environment in which the code runs causes the problem. The snippet of code I use to decode the ExtensionObject comes straight from the example, so I expect to problems there:

  // Handle change events
  private void onSubscriptionValue(UaMonitoredItem item,
      DataValue dataValue)
  {
    m_changeCount++;
    Log.root.info(String.format("Data changed for %s, count = %d",
                                item.getReadValueId().getNodeId().toString(),
                                m_changeCount));

    if (dataValue.getValue().getValue() instanceof ExtensionObject)
    {
      ExtensionObject xo = (ExtensionObject) dataValue.getValue().getValue();

      // Decoding a struct with custom DataType requires a DataTypeManager
      // that has the codec registered with it. OpcUaClient automatically
      // reads any DataTypeDictionary nodes present in the server upon
      // connecting and dynamically generates codecs for custom structures.
      Object valueStruct = null;
      try
      {
        valueStruct = xo.decode(m_readOPCInterface.getSerializationContext());
      }
      catch (Throwable ex)
      {
        System.out.println(String.format("Error '%s' whilst trying to decode: %s ",
                                         ex.getMessage(),
                                         dataValue.toString()));
      }

I have looked into the logging a bit deeper and find that our project already also uses SLF4J. In fact, this warning (which I ignored until now) shows up in the console:

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/D:/Project/MES_Development/OPCUA/ToolboxUA/trunk/common/lib/thirdParty/logback/mes-logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/D:/Project/MES_Development/OPCUA/ToolboxUA/trunk/common/lib/plccom/milo/depends/logback-classic-1.1.7.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [ch.qos.logback.classic.util.ContextSelectorStaticBinder]

Since I have no experience with SLF4J, I cannot determine if this is the reason why the logging does not (seem to) work. It certainly does not cause the decoding problems, because this warning also shows on my own PC.

The SLF4J website mentions this about using libraries or frameworks:

Embedded components such as libraries or frameworks should not declare a dependency on any SLF4J binding but only depend on slf4j-api. When a library declares a compile-time dependency on a SLF4J binding, it imposes that binding on the end-user, thus negating SLF4J's purpose. When you come across an embedded component declaring a compile-time dependency on any SLF4J binding, please take the time to contact the authors of said component/library and kindly ask them to mend their ways.

Does Milo declare a compile-time dependency on SLF4J? In that case, I kindly ask you to "mend your way" :)

kevinherron commented 4 years ago

Yes, Milo declares only a compile time dependency on SLF4J. You’re seeing that error because you have 2 different versions of logback on your class path.

WimVriend commented 4 years ago

Does that mean I should remove the logback Milo depends on?

I would like to enable logging first, so I may tackle this and possibly future issues. Any ideas how I can get it to work? I would have thought the default logging would show in the console output, but it does not... I tried adding some VM arguments, but to no avail yet.

kevinherron commented 4 years ago

Milo doesn't depend on logback unless you've pulled in dependencies from the example projects as well.

Getting logging working is something you have to figure out in the application you're integrating Milo into. That's why libraries are written against slf4j instead of a specific implementation. Choosing and configuring the logging implementation is the application's burden.

WimVriend commented 4 years ago

Hello Kevin,

I have found some time to pick up this issue again. This time I attached the Milo sources to the Eclipse debugging environment, so I can step through the code and check out various variables etc.

I first checked what the 'codecsByEncodingId Map looks like, when it functions OK;

Working codecs

The names of the various Structs are clearly visible.

Then I did the same, for the failing process:

NonWorking codecs

It seems that the DataTypeManager did not manage to get the dataTypes from the OPCUA server correctly. Any ideas what may cause this or where I could look next?

kevinherron commented 4 years ago

The second screenshot doesn't have [0..99] expanded like the first... how can I tell it's any different?

The datatypes are read when the OpcUaClient connects using a SessionInitializer. The actual reading happens in DataTypeDictionaryReader, which is the original logger I suggested you change the level of.

WimVriend commented 4 years ago

Hello Kevin,

The [0..99] content is not relevant: you can see the content in the large pieces of text below it.

Anyway, I had not succeeded in changing the logger level until my previous post. Fortunately, I managed to get it working after that, using this code:

root = (Logger) LoggerFactory.getLogger(DataTypeDictionaryReader.class); root.setLevel(Level.TRACE);

The output shows the content of the . After that, the working process logs a message:

Continuation point was non-null; calling BrowseNext

The not-working process shows an error:

java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessClassInPackage.com.sun.xml.internal.bind.v2.model.nav") at java.security.AccessControlContext.checkPermission(Unknown Source) at java.security.AccessController.checkPermission(Unknown Source) at java.lang.SecurityManager.checkPermission(Unknown Source) at java.lang.SecurityManager.checkPackageAccess(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at ch.qos.logback.classic.spi.PackagingDataCalculator.loadClass(PackagingDataCalculator.java:207) at ch.qos.logback.classic.spi.PackagingDataCalculator.bestEffortLoadClass(PackagingDataCalculator.java:232) at ch.qos.logback.classic.spi.PackagingDataCalculator.computeBySTEP(PackagingDataCalculator.java:138) at ch.qos.logback.classic.spi.PackagingDataCalculator.populateUncommonFrames(PackagingDataCalculator.java:113) at ch.qos.logback.classic.spi.PackagingDataCalculator.populateFrames(PackagingDataCalculator.java:105) at ch.qos.logback.classic.spi.PackagingDataCalculator.calculate(PackagingDataCalculator.java:57) at ch.qos.logback.classic.spi.ThrowableProxy.calculatePackagingData(ThrowableProxy.java:147) at ch.qos.logback.classic.spi.LoggingEvent.(LoggingEvent.java:124) at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:440) at ch.qos.logback.classic.Logger.filterAndLog_2(Logger.java:433) at ch.qos.logback.classic.Logger.debug(Logger.java:511) at org.eclipse.milo.opcua.sdk.client.DataTypeDictionaryReader.lambda$readDataTypeDictionary$7(DataTypeDictionaryReader.java:136) at java.util.concurrent.CompletableFuture.uniExceptionally(Unknown Source) at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(Unknown Source) at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.util.concurrent.CompletableFuture.postFire(Unknown Source) at java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) at java.util.concurrent.CompletableFuture$Completion.exec(Unknown Source) at java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(Unknown Source) at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source)

It seems like the logger causes the registration of the DataTypes to fail ?!

kevinherron commented 4 years ago

It looks like your logging library (logback) is triggering a class to be loaded during the course of logging that the installed SecurityManager is preventing access to for some reason.

java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessClassInPackage.com.sun.xml.internal.bind.v2.model.nav")
    at java.security.AccessControlContext.checkPermission(Unknown Source)
    at java.security.AccessController.checkPermission(Unknown Source)
    at java.lang.SecurityManager.checkPermission(Unknown Source)
    at java.lang.SecurityManager.checkPackageAccess(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at ch.qos.logback.classic.spi.PackagingDataCalculator.loadClass(PackagingDataCalculator.java:207)

More evidence that something is strange about your co-workers environment...

WimVriend commented 4 years ago

After adding the dependencies to "logback-classic-1.1.7.jar" and "logback-core-1.1.7.jar" to the Build Path (again), the errors that seem related to the Logger do not appear any more. I now do get warnings about multiple bindings, but that seems not to cause problems with the logging.

I do get errors related to the code that reads the DataTypes (which I missed before, because I focused on the first occurring problem...). The error reads:

Failed to create DataTypeDictionary: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessDeclaredMembers") java.util.concurrent.CompletionException: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessDeclaredMembers") at java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) at java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) at java.util.concurrent.CompletableFuture.uniCompose(Unknown Source) at java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.util.concurrent.CompletableFuture.postFire(Unknown Source) at java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) at java.util.concurrent.CompletableFuture$Completion.exec(Unknown Source) at java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(Unknown Source) at java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) Caused by: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessDeclaredMembers") at java.security.AccessControlContext.checkPermission(Unknown Source) at java.security.AccessController.checkPermission(Unknown Source) at java.lang.SecurityManager.checkPermission(Unknown Source) at java.lang.Class.checkMemberAccess(Unknown Source) at java.lang.Class.getDeclaredField(Unknown Source) at com.sun.xml.internal.bind.v2.model.nav.ReflectionNavigator.getDeclaredField(Unknown Source) at com.sun.xml.internal.bind.v2.model.nav.ReflectionNavigator.getDeclaredField(Unknown Source) at com.sun.xml.internal.bind.v2.model.impl.RegistryInfoImpl.(Unknown Source) at com.sun.xml.internal.bind.v2.model.impl.ModelBuilder.addRegistry(Unknown Source) at com.sun.xml.internal.bind.v2.model.impl.ModelBuilder.getTypeInfo(Unknown Source) at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(Unknown Source) at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(Unknown Source) at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.(Unknown Source) at com.sun.xml.internal.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(Unknown Source) at com.sun.xml.internal.bind.v2.ContextFactory.createContext(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at javax.xml.bind.ContextFinder.newInstance(Unknown Source) at javax.xml.bind.ContextFinder.newInstance(Unknown Source) at javax.xml.bind.ContextFinder.find(Unknown Source) at javax.xml.bind.JAXBContext.newInstance(Unknown Source) at javax.xml.bind.JAXBContext.newInstance(Unknown Source) at org.eclipse.milo.opcua.binaryschema.parser.BsdParser.parse(BsdParser.java:40) at org.eclipse.milo.opcua.sdk.client.DataTypeDictionaryReader.createDataTypeDictionary(DataTypeDictionaryReader.java:251) at org.eclipse.milo.opcua.sdk.client.DataTypeDictionaryReader.lambda$readDataTypeDictionary$6(DataTypeDictionaryReader.java:134) ... 10 common frames omitted

Maybe the codec cannot be generated or loaded somehow?

kevinherron commented 4 years ago

Maybe the codec cannot be generated or loaded somehow?

Yes, because of the SecurityManager that is blocking it for some reason. It seems there's a SecurityManager preventing access to some of the Java XML/JAXB classes.

This exception: java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessDeclaredMembers")

suggests that reflective access to fields is disallowed by the SecurityManager.

WimVriend commented 4 years ago

Could be in any way file-system or Windows-security related? I have tried to checkout and run the branch on a third PC, and it works fine there...

WimVriend commented 4 years ago

Hello Kevin,

I have re-installed most of the software on the 'bad machine' and still the issue persists. Even granting AllPermission in the policy file does not solve the problem. I am tempted to think it's Windows-related (because it works on the 2 other PC's). Unfortunately I have found nothing that may confirm this suspicion...

Luckily, I have learned some things along the way and I think there are a few things that could (should) be improved concerning the DataTypeManager/DataTypeDictionaryReader:

kevinherron commented 4 years ago

As we saw earlier, the failure to read the DataTypes (or rather: to create the codecs) was silent. Only after adjusting the logging level did I see errors. IMO it should always be visible;

Yes, I'll change some top-level log statement to a warning instead.

I understand from the code, that the DataTypeDictionary is only read once, after creating the OpcUaClient. However, it is very possible that a DataType is changed in the PLC by an engineer (for example an element added) and then downloaded. So IMO the Dictionary should be updated, after the OpcUa server has restarted (at this moment we detect the onSubscriptionTransferFailed event for this);

It reads it every time it connects, including reconnects. Downloading a new datatype or a modification to an existing datatype restarts the server, which causes the client to reconnect and re-read the dictionary. The DataTypeDictionary support was actually developed against a Siemens S7-1500 so I'm sure this works.

When an OnSubscriptionValue event arrives for an ExtensionObject, it is clear that a DataType for that ExtensionObject must be present in the Server. I would expect the DataTypeManager to at least try to create a codec, if the decoder was not found;

It is too expensive to read the DataTypeDictionary in the onSubscriptionValue call, and it should have already been read when the client connected.

--

I believe the correct approach to re-reading the dictionary is to watch its Version property for changes.

When I eventually implement OPC UA 1.04 I'll consider reading the types dynamically as this whole dictionary system has been deprecated and replaced with something much simpler that makes it possible to read the definition for a single struct type, either on the fly or ahead of time.

WimVriend commented 4 years ago

Hello Kevin,

Thanks again. I did not actually try to change a DataType myself, but I'm glad you say that it should already work. Of course I will give it a go :)

The fact that Milo is able to get the DataTypes dynamically is a very important 'plus' for us. Other stacks we have tried require a lot of manual configuration to statically generate codecs...

Of course it would be great if the system becomes simpler and possibly even more robust? We don't want to read a DataType every time a value changes. But it would be nice if the system would only create codecs for the ones we actually need (many DataTypes in the PLC are not read by our system at all...)

kevinherron commented 4 years ago

The fact that Milo is able to get the DataTypes dynamically is a very important 'plus' for us.

It was a requirement at work that drove the implementation in Milo :) Same use case initially, the S7-1500, which performs terribly unless you read whole structured values and not individual component values.

But it would be nice if the system would only create codecs for the ones we actually need (many DataTypes in the PLC are not read by our system at all...)

This will be possible in the future when OPC UA 1.04 is implemented (in theory, anyway), but right now the dictionary has to be read in its entirety and since it's done ahead of time there's no way to know which codecs are needed. The whole DataType / DataTypeDictionary system prior to 1.04 is very complex and fragile. They did a really nice job simplifying it in 1.04.