Excel-DNA / ExcelDna

Excel-DNA - Free and easy .NET for Excel. This repository contains the core Excel-DNA library.
https://excel-dna.net
zlib License
1.26k stars 270 forks source link

Error value on multiple calls to a ThreadSafe async function #667

Closed Sergey-Vlasov closed 5 months ago

Sergey-Vlasov commented 5 months ago

value

        [ExcelFunction(IsThreadSafe = true)]
        public static object SayHelloAsyncThreadSafe(string name)
        {
            return SayHelloAsync(nameof(SayHelloAsyncThreadSafe), name);
        }

        private static object SayHelloAsync(string callerFunctionName, string name)
        {
            int threadId = Thread.CurrentThread.ManagedThreadId;
            return ExcelAsyncUtil.Run(callerFunctionName, new object[] { name }, () => SayHelloWithDelay(name, threadId));
        }

        private static string SayHelloWithDelay(string name, int threadId)
        {
            Thread.Sleep(2000);
            return $"Hello {name} (thread {threadId})";
        }

To reproduce:

  1. Paste =SayHelloAsyncThreadSafe("TS") in the first cell.

  2. Expand the first cell to multiple down cells.

It occurs only once in maybe 20 attempts. So, you need to undo and repeat step 2 a lot.

govert commented 5 months ago

I can reproduce this (though I've never seen more than one entry fail). I added some more details and logging to the output:

        [ExcelFunction(IsThreadSafe = true)]
        public static object SayHelloAsyncThreadSafe(string name)
        {
            int threadId = Thread.CurrentThread.ManagedThreadId;
            try
            {
                return $"{threadId} | {SayHelloAsync(nameof(SayHelloAsyncThreadSafe), name)}";
            }
            catch (XlCallException xlex)
            {
                Debug.Write(xlex);
                return $"{threadId} | {xlex.Message} / {xlex.xlReturn}";
            }
            catch (System.Exception ex)
            {
                Debug.Write(ex);
                return $"{threadId} | {ex.Message}";
            }
        }

This gives me a stacktrace when it fails

ExcelDna.Integration.XlCallException: Exception of type 'ExcelDna.Integration.XlCallException' was thrown.
   at ExcelDna.Integration.XlCall.Excel(Int32 xlFunction, Object[] parameters)
   at ExcelDna.Integration.Rtd.AsyncObservableState.TryGetValue(Object& value)
   at ExcelDna.Integration.Rtd.AsyncObservableImpl.ProcessObservable(String functionName, Object parameters, ExcelObservableOptions options, ExcelObservableSource getObservable)
   at TestAsyncMT.Functions.SayHelloAsync(String callerFunctionName, String name)
   at TestAsyncMT.Functions.SayHelloAsyncThreadSafe(String name)ExcelDna.Integration 

The only direct call we have to XlCall.Excel inside AsyncObservableState.TryGetValue is in the beginning:

        public bool TryGetValue(out object value)
        {
            // We need to be careful when this is called from an array formula.
            // In the 'completed' case we actually still have to call xlfRtd, then only skip if for the next (single-cell caller) call.
            // That gives us a proper Disconnect...
            ExcelReference caller = XlCall.Excel(XlCall.xlfCaller) as ExcelReference;
            bool isCallerArray = caller != null &&
                                 (caller.RowFirst != caller.RowLast ||
                                  caller.ColumnFirst != caller.ColumnLast);

            bool refreshRTDCall;
            bool recordRtdComplete;
            lock (_lock)
            {
                refreshRTDCall = (_currentObserver == null || isCallerArray || !IsCompleted());
                recordRtdComplete = (_currentObserver != null && IsCompleted());
            }

            if (refreshRTDCall)
            {
            // etc.

While we would like to be able to call xlfCaller in this type of async call (sometimes you want the async to run per-cell) the case here is a bit unique. We are guarding against RTD calls from CSE array formulas in pre-Dynamic Arrays Excel.

One possible adaptation would be to only make the xlfCaller call if we have to - after checking the other possibilities

    refreshRTDCall = (_currentObserver == null || !IsCompleted() || GetIsCallerArray() );

This doesn't get to the bottom of why xlfCaller fails here, and it might still do so in other cases. We might want to avoid the xlfCaller call here completely in some cases (e.g. under Dynamic Arrays Excel) or we can retry it (hoping that the error is due to an internal race between threads in Excel - we are in this example calling this C API from many threads at once.)

If the problem is xlfCaller, we might try to reproduce in a simple non-RTD thread-safe function that calls xlfCaller. That can take us to a reproduction in a C add-in if this is an Excel bug (as far as I know (we should check) xlfCaller is one of the documented thread-safe functions).

govert commented 5 months ago

I can't get an error with a simple function that just calls xlfCaller

        [ExcelFunction(IsThreadSafe = true)]
        public static object GetCallerInfo()
        {
            int threadId = Thread.CurrentThread.ManagedThreadId;
            var caller = XlCall.Excel(XlCall.xlfCaller) as ExcelReference;
            return $"{threadId} | {caller}";
        }

Maybe that's not the problem, or the interaction with the RTD calls makes things go wrong.

It might be worth trying a version with some extra error handling in the AsyncObservableState.TryGetValue implementation, to confirm what the failing call is.

Sergey-Vlasov commented 5 months ago

I can much more easily reproduce the problem if I change in TryGetValue

            ExcelReference caller = XlCall.Excel(XlCall.xlfCaller) as ExcelReference;

to

            ExcelReference caller = null;
            for (int i = 0; i < 1000; ++i)
                    caller = XlCall.Excel(XlCall.xlfCaller) as ExcelReference;
Sergey-Vlasov commented 5 months ago

The reason why XlCall.Excel(XlCall.xlfCaller) fails is because _suspended is sometimes true:

        public static object Excel(int xlFunction, [NotNull] params object[] parameters)
        {
            object result;
            XlReturn xlReturn = TryExcel(xlFunction, out result, parameters);

            if (xlReturn == XlReturn.XlReturnSuccess)
            {
                return result;
            }

            throw new XlCallException(xlReturn);
        }

        public static XlReturn TryExcel(int xlFunction, [CanBeNull] out object result, [NotNull] params object[] parameters)
        {
            if (_suspended)
            {
                result = null;
                return XlReturn.XlReturnFailed;
            }
            return ExcelIntegration.TryExcelImpl(xlFunction, out result, parameters);
        }
Sergey-Vlasov commented 5 months ago
        // Support for suspending calls to the C API
        // Used in the RTD Server wrapper - otherwise C API calls from the RTD methods can crash Excel.
        static bool _suspended = false;

If I comment out in TryExcel

             if (_suspended)
            {
                result = null;
                return XlReturn.XlReturnFailed;
            }

The test example works. So, not sure is this check important.

(BTW, it should be static volatile bool _suspended, but it is not the problem here.)

govert commented 5 months ago

The XlCall.Suspend() mechanism is important, though I can't recall all the details. The RTD Com activation or any of the COM callbacks are not safe times to call the C API, and Excel would crash sometimes. So I added the Suspend to make sure that user code which would run at these times cannot mistakenly call back into Excel, instead returning an error directly and not risking the Excel crash. This was done without having the multi-threaded UDF wrappers in mind.

We should not remove the Suspend plan, but might consider two approaches:

I'm very happy for you to experiment a bit more with this.

Sergey-Vlasov commented 5 months ago

If we can safely call any thread-safe C API function while the COM action is going happening on another thread, [ThreadStatic] _suspended is the perfect solution and no need to change anything else as it will not be accessed from different threads.

I've tried it and it resolves the original problem.