Beckhoff / TF6000_ADS_DOTNET_V5_Samples

Sample code for the Version 6.X series of the TwinCAT ADS .NET Packages
https://infosys.beckhoff.com/content/1033/tc3_ads.net/9407515403.html?id=6770980177009971601
BSD Zero Clause License
37 stars 15 forks source link

Concurrency issue when reading/writing #38

Closed odalet closed 11 months ago

odalet commented 1 year ago

I'm currently in the process of porting code that used the v4 ADS client to v6. The few breaking APIs are not much to handle and things are going quite well. However I encountered one strange behavior.

If concurrently reading and writing a variable through its symbol, I regularly encounter errors. This didn't happen with the v4 client (by using nearly the same code). Any clue wjhy this is happening?

Write Error: Could not write Symbol 'Global_Variables.gULInt (IG: 0x4040, IO: 0x5e0f0, Size: 8 bytes)' '! Error: DeviceSymbolNotFound

NB: this also sometimes happen on the read side.

You'll find below my test code. Note that, in order to avoid these errors, I have to lock around symbol operations (and this was not needed before). The problem is that this lock incurs a performance penalty I'd rather not have.

NB: I'm using Beckhoff.TwinCAT.Ads 6.0.216 in a net6.0 project on Windows x64

using System;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
using TwinCAT;
using TwinCAT.Ads.TypeSystem;
using TwinCAT.TypeSystem;

// Hacky compatibility layer... comment out if using Beckhoff.TwinCAT.Ads nuget v4
using TcAdsClient = TwinCAT.Ads.AdsClient;

namespace ConcurencyBugRepro
{
    internal sealed class Program : IDisposable
    {
        private readonly TcAdsClient client;
        private readonly object symbolsSync = new object();
        private ISymbolLoader symbolLoader;

        private static void Main()
        {
            using (var p = new Program())
                p.Run();
        }

        private Program()
        {
            client = new TcAdsClient();
        }

        public void Dispose()
        {
            _ = client.Disconnect();
            client.Close();
            client.Dispose();
        }

        private void Run()
        {
            client.Disconnect(); // Just to be on the safe side
            client.Connect("10.255.1.82.1.1", 851);
            symbolLoader = SymbolLoaderFactory.Create(client, SymbolLoaderSettings.Default);

            RunTest();
        }

        private void RunTest()
        {
            var ok = true;
            Console.WriteLine("Running Concurrent Tests...");

            const int count = 3;

            var timings = new long[count];
            var tasks = new Task<bool>[count];

            for (var i = 0; i < count; i++)
            {
                var id = i;
                tasks[i] = Task<bool>.Factory.StartNew(() => TestConcurrent(id, out timings[id]));
            }

            var oks = tasks.Select(t => t.Result).ToArray();
            ok = oks.All(b => b);

            Console.WriteLine(ok ? "All Concurrent Tests passed" : "Some Concurrent Tests failed", ok ? ConsoleColor.Green : ConsoleColor.Red);

            for (var id = 0; id < count; id++)
                Console.WriteLine($"#{id + 1} - Elapsed (ms): {timings[id]}", ConsoleColor.Yellow);
        }

        private bool TestConcurrent(int id, out long elapsed)
        {
            var ok = true;
            var stopWatch = new Stopwatch();
            stopWatch.Start();

            const int max = 80;
            for (var i = 0; i < max; i++)
            {
                var succeeded = WriteTest();
                Console.WriteLine($"#{id + 1} {i + 1:00}/{max:00} - Write: {(succeeded ? "OK" : "KO")}");

                succeeded = ReadTest();
                Console.WriteLine($"#{id + 1} {i + 1:00}/{max:00} - Read : {(succeeded ? "OK" : "KO")}");
            }

            stopWatch.Stop();
            elapsed = stopWatch.ElapsedMilliseconds;
            return ok;
        }

        private bool WriteTest()
        {
            try
            {
                Write("Global_Variables.gULInt", ulong.MaxValue);
                return true;
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Write Error: {ex.Message}");
                return false;
            }
        }

        private bool ReadTest()
        {
            try
            {
                _ = Read("Global_Variables.gULInt");
                return true;
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Read Error: {ex.Message}");
                return false;
            }
        }

        // Below, uncommenting the locks has the code work with v6, but they aren't needed
        // with v4 and I wish they weren't either in v6 as this incurs poor performance.

        private object Read(string variableName)
        {
            if (string.IsNullOrEmpty(variableName)) throw new ArgumentException(
                "Variable name cannot be null or empty", nameof(variableName));

            var symbol = (Symbol)symbolLoader.Symbols[variableName];
            //lock (symbolsSync)
            return symbol.ReadValue();
        }

        private void Write(string variableName, object value)
        {
            if (string.IsNullOrEmpty(variableName)) throw new ArgumentException(
                "Variable name cannot be null or empty", nameof(variableName));

            var symbol = (Symbol)symbolLoader.Symbols[variableName];

            if (value is byte[] bytes)
                //lock (symbolsSync)
                symbol.WriteRawValue(bytes);
            else //lock (symbolsSync)
                symbol.WriteValue(value);
        }
    }
}
beppemarazzi commented 1 year ago

Looking into source code with decompiler (using VisualStudio hit F12 on the AdsClient symbol in your sample, then search for HandleCache and hit F12 again then search for TryDeleteVariableHandle) i found this:

  public AdsErrorCode TryDeleteVariableHandle(uint variableHandle, int timeout)
  {
      AdsErrorCode result = AdsErrorCode.NoError;
      SymbolEntry value = null;
      if (_handleEntryDict.TryGetValue(variableHandle, out value))
      {
          if (Interlocked.Decrement(ref value.referenceCount) == 0)
          {
              byte[] bytes = BitConverter.GetBytes(value.serverHandle);  //*******************A
              result = _accessor.TryWrite(61446u, 0u, bytes.AsMemory());
              bool flag = _handleEntryDict.TryRemove(variableHandle, out value); //************************B
              uint value2 = uint.MaxValue;
              bool flag2 = _pathHandleDict.TryRemove(value.symbolPath, out value2);
          }
      }
      else
      {
          result = AdsErrorCode.DeviceSymbolNotFound;
      }

      return result;
  }

IMHO here there is a race condition: one thread may be interrupted between marked lines, meanwhile a second thread may gain access to some handle in the _handleEntryDict, when the first thread will resume at line ****B it removes the handle, then when the second thread tries to delete the handle it wil get AdsErrorCode.DeviceSymbolNotFound;

Probably if you gain access to a variable handle corresponding to your symbol before starting your threads all works fine....

I asked to beckhoff to open the source of this libraries.... they may gain a lot of help from the community, while us could rely on more solid platform!!!! #40

odalet commented 1 year ago

@beppemarazzi Although I didn't mention it in my description I also had a look with ILSpy and reached a conclusion similar to yours: something is going astray when the handle is being removed from this cache thing. I agree some threading protection should exist in this code, but it's all internal implementation. This is why I'm locking before read/writes: it's the lowest accessible level I can do something (without building my own version of Beckhoff nugets from decompiled code... which I'd rather avoid!). I can only hope now Beckhoff's team will handle this.

beppemarazzi commented 1 year ago

@odalet Did you try to add a simple client.CreateVariableHandle("Global_Variables.gULInt") just after symbols loading? this may fix your problem avoiding continuously adding and removing handle from the internal cache....

odalet commented 1 year ago

I didn't. I'll give it a try even if it feels weird: looks a bit like static variables preventing .NET GC to operate... By the way, I don't really understand why the handle cache behaves like it does: removing cached handles as soon as they are not used defeats the purpose of a cache IMO...

beppemarazzi commented 1 year ago

agree... it's not a cache: it's a poorly implemented reference counted items pool

beppemarazzi commented 1 year ago

In your real world app (if it's possible), you could gain access to some handles before you need to use them and release them when no more needed (or after some time)... Your handle will ensure that the internal reference counter remain at least 1 till you release it. Or (in other words) implement your own handle cache...

odalet commented 1 year ago

Right, that's probably where I'm headed...

odalet commented 1 year ago

@beppemarazzi TLDR; thanks for the tip: it works.

I've (roughly) implemented your suggestion: at startup of my client, I read all the symbols and store their associated handles in a HashSet. This takes care of incrementing the reference count. With this in place it seems I don't need to lock any more around read/writes. This is hacky and I wish the underlying code will be fixed but it's good enough for now. Anyway I have other issues I hope will be fixed before using all of this in production.

RalfHeitmann commented 1 year ago

Please try out the 6.0.235. I solved the issues that are appearing with the test code. As further information: The AdsClient doesn't guarantee reentrancy to this high level. Especially the loading of the Symbols should be done before reading / writing the values. The symbolLoader.Symbols access does this on its first call. Because you do this concurrently at the application start, you end up with concurrent unnecessary symbol and DataType Uploads - because this situation was not serialized in Versions < 6.0.235. Another issue in this test code is that you use consequently the synchronous methods in the test code. That doesn't make sense when you force to do it in many concurrent threads. A smoother way would be to use the async methods in applications which should be more performant and removes the necessity to create concurrent threads - and simplifies the code:

using System.Diagnostics;
using TwinCAT;
using TwinCAT.Ads.TypeSystem;
using TwinCAT.TypeSystem;
using TwinCAT.ValueAccess;

// Hacky compatibility layer... comment out if using Beckhoff.TwinCAT.Ads nuget v4
using TcAdsClient = TwinCAT.Ads.AdsClient;

namespace ConcurencyBugRepro
{
    internal sealed class Program : IDisposable
    {
        private readonly TcAdsClient client;
        private readonly object symbolsSync = new object();
        private ISymbolLoader symbolLoader;

        private async static Task Main()
        {
            using (var p = new Program())
                await p.Run();
        }

        private Program()
        {
            client = new TcAdsClient();
        }

        public void Dispose()
        {
            _ = client.Disconnect();
            client.Close();
            client.Dispose();
        }

        private async Task Run()
        {
            client.Disconnect(); // Just to be on the safe side
            client.Connect("1.1.1.1.1.1", 851);

           // LOAD SYMBOLIC INFORMATION BEFORE ACCESS 
            symbolLoader = SymbolLoaderFactory.Create(client, SymbolLoaderSettings.Default);

            await RunTest();
        }

        private async Task RunTest()
        {
            var ok = true;
            Console.WriteLine("Running Concurrent Tests...");

            const int count = 3;

            //var timings = new long[count];
            var tasks = new Task<(bool,long)>[count];

            for (var i = 0; i < count; i++)
            {
                var id = i;
                //tasks[i] = Task<bool>.Factory.StartNew(async () => await TestConcurrent(id, out timings[id]));
                tasks[i] = TestConcurrent(id);
            }

            //var oks = tasks.Select(t => t.Result).ToArray();
            // ok = oks.All(b => b);

            var results = await Task.WhenAll<(bool,long)>(tasks);
            ok = results.All(r => r.Item1);

            Console.WriteLine(ok ? "All Concurrent Tests passed" : "Some Concurrent Tests failed", ok ? ConsoleColor.Green : ConsoleColor.Red);

            for (var id = 0; id < count; id++)
                Console.WriteLine($"#{id + 1} - Elapsed (ms): {results[id].Item2}", ConsoleColor.Yellow);
        }

        private async Task<(bool,long)> TestConcurrent(int id)
        {
            long elapsed;
            var ok = true;
            var stopWatch = new Stopwatch();
            stopWatch.Start();

            const int max = 80;
            for (var i = 0; i < max; i++)
            {
                var succeeded = await WriteTestAsync();
                Console.WriteLine($"#{id + 1} {i + 1:00}/{max:00} - Write: {(succeeded ? "OK" : "KO")}");

                succeeded = await ReadTestAsync();
                Console.WriteLine($"#{id + 1} {i + 1:00}/{max:00} - Read : {(succeeded ? "OK" : "KO")}");
            }

            stopWatch.Stop();
            elapsed = stopWatch.ElapsedMilliseconds;
            return (ok,elapsed);
        }

        private async Task<bool> WriteTestAsync()
        {
            try
            {
                await WriteAsync("GVL.gULInt", ulong.MaxValue);
                return true;
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Write Error: {ex.Message}");
                return false;
            }
        }

        private async Task<bool> ReadTestAsync()
        {
            try
            {
                _ = await ReadAsync("GVL.gULInt");
                return true;
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Read Error: {ex.Message}");
                return false;
            }
        }

        // Below, uncommenting the locks has the code work with v6, but they aren't needed
        // with v4 and I wish they weren't either in v6 as this incurs poor performance.

        private async Task<ResultReadValueAccess> ReadAsync(string variableName)
        {
            if (string.IsNullOrEmpty(variableName)) throw new ArgumentException(
                "Variable name cannot be null or empty", nameof(variableName));

            ResultSymbols resultSymbols = await symbolLoader.GetSymbolsAsync(CancellationToken.None);
            Symbol symbol = (Symbol)resultSymbols.Symbols[variableName];
            //lock (symbolsSync)
            return await symbol.ReadValueAsync(CancellationToken.None);
        }

        private async Task WriteAsync(string variableName, object value)
        {
            if (string.IsNullOrEmpty(variableName)) throw new ArgumentException(
                "Variable name cannot be null or empty", nameof(variableName));

            //var symbol = (Symbol)symbolLoader.Symbols[variableName];
            ResultSymbols resultSymbols = await symbolLoader.GetSymbolsAsync(CancellationToken.None);
            Symbol? symbol = (Symbol?)resultSymbols.Symbols[variableName];

            if (value is byte[] bytes)
                //lock (symbolsSync)
                await symbol.WriteRawValueAsync(bytes, CancellationToken.None);

            else //lock (symbolsSync)
                await symbol.WriteValueAsync(value,CancellationToken.None);
        }
    }
}
odalet commented 1 year ago

Thanks for looking into this. I'll try the new version as soon as I can!

RalfHeitmann commented 11 months ago

Expecting that this is fixed. Closing this abandoned issue ...

odalet commented 11 months ago

Thanks for your work and sorry for the late reply. Didn't have the chance to go over this (and not sure when/if I will). I'll keep track of this issue for reference.