MarkPflug / Sylvan.Data.Excel

The fastest .NET library for reading Excel data files.
MIT License
241 stars 30 forks source link

Ole2Package+Ole2Stream abruptly quits when reading an XLS from a FileStream in a Blazor Server app #81

Closed CaiusJard closed 1 year ago

CaiusJard commented 2 years ago

I've a problem reading a particular, reasonably sized (700kb, 2600 rows) file in a Blazor server app. I've done various tests of changing the problem file and project type:

It's like an exception is thrown that kills the background reading thread, and the exception is silently swallowed. When debugging, I get to line 136 of the Ole2Stream class, there's a ReadAsync, and stepping over it manifests the problem; the yellow bar disappears and never returns no matter how long it's left..

Here's the code, and debug prints (some added by me):

image

The red box delineates the first time round that position increments from 0; this operation finishes OK. Another read occurs with a bigger buffer, and position starts from 0 again; the code reaches the ReadAsync and starts doing it (I assume) but the call never returns and no exception is thrown. The app remains running.

To repro the problem create a new Blazor Server project, and Install/add/reference Sylvan Excel by package or code. Open the Counter.razor page and add these two lines to the IncrementCount() method.

        using var fs = new FileStream(@"C:\temp\problem.xls", FileMode.Open, FileAccess.Read, FileShare.Read, 128 * 1024);
        using var edr = ExcelDataReader.Create(fs, ExcelWorkbookType.Excel);

I'll email the problem.xls file over..

CaiusJard commented 2 years ago

Incidentally, if I swap to using a syncronous Read() instead, I don't get a problem

CaiusJard commented 2 years ago

As another aside; it seems if I bufffer the entire file into a memory stream before I read (and get sylvan to read from the MS), I don't get a problem; there may be nothing Sylvan can do to resolve this, if it's a manifestation of the runtime disposing of the filestream before sylvan's done reading it (or maybe the fault is mine in terms of how I've decided to read the file being incompatible with BSS)

MarkPflug commented 2 years ago

I know exactly what's happening. This is because I don't fully/correctly support async in this library yet. At a minimum, I need to add .ConfigureAwait(false) to every async callsite. This is specifically what is causing the problem you are seeing, which is a deadlock. Apparently, blazor server runs in a single threaded environment under a synchronization context. Without the ConfigureAwait(false) async operations will trivially deadlock in such an environment.

Beyond that, here is additional work that I need to investigate with respect to handling .xlsx files asynchronously. This is one of the things that I want to complete before I move to a 1.0 release, but I don't have a roadmap or a timeline for getting that done.

In the meantime, it sounds like you have a workaround: read the whole file into memory first, which can be done asynchronously using the file IO apis.

MarkPflug commented 1 year ago

Well, it took me more than a year to get around to it, but I've finally added proper async support to the library. You'd need to change your code to use the ExcelDataReader/Writer.CreateAsync functions instead of Create, and also use async disposal, but it should then function correctly in an async-only context. Internally, it will buffer the entire file IO upon creation/disposal, so the only benefit over handling the buffering yourself is convenience. The buffering mechanism I use attempts to avoid LOH allocation and uses pooling to try to be GC-friendly.

This was just published to nuget with the 0.4.16 release.