drewnoakes / metadata-extractor-dotnet

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Other
950 stars 170 forks source link

Add async support #256

Open lejsekt opened 4 years ago

lejsekt commented 4 years ago

What do you think about adding async method overloads? There is a lot of operations on streams that could benefit from that.

drewnoakes commented 4 years ago

I've pondered this. It would require some consideration, and potentially heavy use of ValueTask<> to avoid excessive GC overhead.

@kwhopper have you given this any thought as part of your IO work in #250?

kwhopper commented 4 years ago

Not specifically, but it's certainly easier to deal with in a RAS structure; fewer classes and situations to deal with. I'd thought about adding interfaces to the new classes and only using those in the core library. Then the implementation could be swapped out or dynamically loaded, which could potentially play into this.

I have done a few things with async in web applications but am no expert. A caller would need to know very early whether to use async-specific methods. So for the library in general, wouldn't it need to be async (or not) all the way down? It seems a bit difficult to be one way or the other in compiled code.

oliver021 commented 3 years ago

@kwhopper I would like to be able to understand more what it explains about "a library must be complete asynchronous or not", because I have a lot of interest in contributing in this matter, this library was perfect in a project that I maintain, so the only thing that made it a bit difficult for me is having to work a synchronous method in an environment where I do a lot of I/O operations, a synchronous method is a super plus.

kwhopper commented 3 years ago

@oliver021 All I mean is a truly "asynchronous" pipeline has no parts that are synchronous in it...

https://docs.microsoft.com/en-us/archive/msdn-magazine/2013/march/async-await-best-practices-in-asynchronous-programming#async-all-the-way

I'm totally fine with exploring an asynchronous version of MetadataExtractor. The trick is making it async from start to finish all in one swoop, at least for the IO-intensive or blocking operations that matter. There can't be blocking, synchronous calls sprinkled in here or there or we're asking for random trouble. As the article indicates, there are real possibilities for deadlocks when the two are mixed inappropriately.

oliver021 commented 3 years ago

@kwhopper I completely agree with you, so doing an asynchronous method with Task.Run implies a false implementation of asynchronicity.

The fundamental idea is to take turns using the threadpool I / O operations and non-I / O operations that is, those that process or work with the result that the I / O operation brought, to explain it better, let's imagine that we are reading a file sequentially and extracting bytes from the reading which implies that when you already have the array of bytes (buffer), whatever we do with that delay array at the next reading (I / O operation), with an asynchronous flow we keep busy with the results of reading the file and another thread takes care to deliver the next result of reading the file, but this does not mean that our operations are faster because they are asynchronous although what is streamlined by having the operations reading on another thread is slowed down by synchronization operations between threads and the abstractions implied by the System.Threading.Tasks.Task, any benchmark in .Net reveals that a single asynchronous operation with respect to its synchronous version it is slower, asynchronicity is not speed, it is resistance and scalability. Now using Task.Run orThreadPool.QueueUserWorkItem we can fall into the problem that you are pointing out because the thread that calls the method asynchronous implemented with the two previous forms you will perceive that the implementation is asynchronous, for the thread that the threadpool designates to execute synchronous implementation runs a serious risk of deadlocking or threadpool will be forced to generate many threads for dozens of calls with this type of implementation or queuing too many calls causing disasters in performance without achieving scalability, in short the asynchronous code is business, we invest (that is, we lose speed) to achieve scalability