dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

Proposal for minimalistic imaging platform #16307

Closed dajuric closed 1 year ago

dajuric commented 8 years ago

Proposal for minimalistic imaging platform

.NET does not provide a truly portable image format nor portable image IO operations. For example System.Drawing.Bitmap nor WPF's BitmapSource and does not provide generic interface which imposes compile-time constraints. EmguCV's generic image provides the generic interface but is heavily coupled with OpenCV.

All mentioned image types do not provide a unified interoperability needed when mixing image-processing algorithms from different libraries.

Goals:

To leverage existing structure and components of DotImaging framework, but to replace the backing OpenCV part needed for image IO. The first step would be to create a portable image reading/writing module with the most simplistic API (e.g. ImageIO.LoadColor(), LoadGray()...).
Options:

Proposal for Cross-plat Server-side Image Manipulation Library - proposal of much broader scale - this proposal resolves the core issue and levaes out separates image processing part managed by other libraries Accord.NET, Accord.NET Extensions, EmguCV.

.NET core should have primitive drawing types - already done in DotImaging.Primitives2D

shmuelie commented 8 years ago

Just a note about storing the image data: Don't do it in managed memory. I've been doing lots of image processing and writing my own "Bitmap" types. I learnt this leading the hard way.

I'll have to see if I can show any of my work or it's stuck under NDA

dajuric commented 8 years ago

Can you elaborate why not to store the image data in the managed memory ?

I have also been doing a lot of image processing and created two frameworks (Accord.NET Extensions and DotImaging. After many iterations I have concluded that it is the easiest to store the data into managed memory and if you need speed to temporary lock the array using "unmanaged image" (DotImaging-generic image).

The reasons why I am using managed container are the following:

I am just putting some arguments why I chose this approach (just want to have a constructive conversation); I am curious why did you choose unmanaged container ? (omitting maybe speed and IO)

Thanks for reply and the effort tho show your work.

shmuelie commented 8 years ago

Sorry about the bluntness and stating what really is a "in my case" as an "always". Replying to GitHub issues at 3 AM is never a good idea.

First a quick overview of my architecture: UnmanagedMemory: This class is in charge of managing unmanaged memory. It handles allocation, deallocation, tracking usage, tracking total usage, preventing leaks, and provides friendly exceptions. BitmapPlus: Pretty much a raw image in memory. It uses UnmanagedMemory internally to store the data. Provides methods for manipulating the data. WriteableBitmapPlus: A custom implementation of WriteableBitmap, using UnmanagedMemory internally.

One big reason I use unmanaged memory is that, at least in my use cases, images work against the GC. Images are large and long lived objects. At the same time when I don't need them any more I can't have their memory just lying around, otherwise I hit OOM very quickly. If they are put in the standard heap they'll probably make it to Gen 2. Which means they won't go away for a while and when they do the user feels it. If they are put on the LOH they will almost never be reclaimed and/or cause massive fragmentation that will speed up the time to OOM.

Another reason, though I admit this may be very particular to my use case, is that that I repeatedly find myself passing the images to unmanaged land. The constant need to fix the backing array's location also (once again) messes with the GC and causes issues.

Those are the top issues I've run into that have caused me to switch to unmanaged memory.

As to the points you bring up:

it leverages .NET native array structure

I admit the native "unsafe" code is more complex to understand but it's not impossible.

more easier to debug

I admit seeing the actual values of the memory is harder (though VS has the lovely memory windows). However, in general when debugging I'm more concerned with the algorithm than the actual bytes in the image.

does not introduce any new classes

not sure if this is really an advantage in my opinion

can be temporary locked if you need speed

Flip side:unmanaged can copy into managed code if need to (Which WriteableBitmapPlus does in fact)

portable

You got me there.

can be leveraged by managed OpenCL implementation

And unmanaged can be leveraged by SharpDX (a managed DirectX implementation)

Hope this explains my thinking!

dajuric commented 8 years ago

Hi Samuel,

thanks for the reply. No need to apologize, I just wanted to show you the reasons why I chose the managed approach :)

garbage collection

That bothered me too, and that is why my first implementation was unmanaged image Image<TColor>. I do not know much about GC design, but I can tell you what I have learned from experience.

The algorithms I write or use are related to video processing where each frame is consumed e.g. Accord.NET Extensions . Yes, the image objects are big and they do go into 2nd generation, but a new design of GC (from .NET 4.5.1 Article, Options) handles those objects more efficiently, and it restores the consumed memory. When processing video, I need only one image allocation is needed (for frame), so you do not experience any lag or performance issues (the memory allocation remains constant).

Moreover, you can call GC.Collect() to restore the memory if you need to (Options) (I do not need that).

Plus and minuses regarding unamanged representation (for me):

var image = new Bgr<byte>[480, 640]; using(var uImg = image.Lock()) { unmanagedFunc(uImg.ImageData, uImg.Stride, ...); }

So, to conclude: I see unmanaged / unsafe C# features more like extensions not as fully integrated feature as the interpoerability is not two-way, meaning that you can not cast unamanaged data into managed objects + pointers and generic do not come along.

In order to leverage the best from both sides: the simplicity of managed world and speed of pointer processing, I have created the framework where managed native array is the central element (similar as in MATLAB) and in the same time fast image processing and interop with other libraries is enabled through slim unamanged generic image class.

I am not saying that this approach is better, but I find it more user-friendly and less error-prone. I find it interesting, useful and fun to have such conversation, so please do not hesitate to comment; I am willing to learn.

shmuelie commented 8 years ago

I do agree that features like unmanaged memory and unsafe code are less user friendly than "normal" C#. However, for base framework code they're not uncommon. The BCL is full of interop and that's fine because the average user is not messing with the interop code. They're using the nice managed APIs that are above it.

The .NET 4.5.1 improvements sound great! Sadly at work we're still on 4.0, though we are looking into moving to 4.5.

allocation and deallocation of many unamanged images is slower than managed counterpart (tested using AForge.NET)

I wouldn't mind seeing those tests since in my testing unmanaged allocation and deallocation was much faster.

what problems does it cause if an object is temporary locked ?

When you lock an object in place (whether using the fixed keyword or using GCHandle) the GC has to work around the object when compacting/defragmenting the heap. This both makes it harder for the GC and can make the job of compacting/defragmenting less efficient.

you need to take responibility to deallocate such objects which seems unatural in C#

It is on the weirder side to do so in .NET but IDisposable usage is common and calling GC.Collect() is really the same thing.

the backend behind is more complex (your framework has a unmanaged image class, a manager and a wrapper); and more xomplex designs (I am not saying yours) is potentionally more error prone. I had simmilar architecture (before), but I found the design was complex and buggy.

It is true that the greater the complexity the higher the bug count. Most simple designs work though because someone else is doing the complex work.

I'm loving this conversation :)

shmuelie commented 8 years ago

@terrajobst think you could rope in a GC person? For all the time I've spent in .NET I'm really not a GC specialist.

dajuric commented 8 years ago

Hi Samuel,

thanks for the response.

Mind the tests.

If I currently remember, I allocated several hundred times anUnmanagedImage from AForge.NET and native .NET array. After each call I disposed an unmanaged image (I do not recall what I did with GC.Collect()).

One application example which benefited from managed heap is Fast template matching - fast object detection. It creates several images from each frame, and each time those images are deallocated. One of the first versions was written with managed wrapper that called (malloc ?) and than it freed the memory. When I rewrote the code using managed arrays and pinning the performance actually increased (I was also afraid that the performance will drop).

Temporary object locking

Yes I am aware of this, that is why I stressed temporary (you only lock the object when some operation should be performed). I have not measured the impact on such scenarios, but from my experience, I would say it is not much.

Simple design

I wanted to tell clean, nice (as in MATLAB) where operations on matrices are fast, but the user is not exposed to pointers, etc. It is interesting to see that if you put a little bit more stress on design vs. performance, the whole perspective changes; another things are optimized :)

I am glad we could exchange opinions. Looking forward for more.

shmuelie commented 8 years ago

Allocations To help my tests were:

Unmanaged:

using (UnmanagedMemory mem = new UnmanagedMemory(short.MaxValue))
{
}

Managed:

byte[] b = new byte[short.MaxValue];
b[0] = 1;
b = null;
GC.Collect(0);

Each of those tests were timed using the Stopwatch class, run 1000 times, and then the average was taken. The results were:

Unmanaged: 00:00:00.0000017
Managed: 00:00:00.0000149

On one hand unmanaged is faster, on the other we're talking about such tiny amounts of time...

Code Design

I admit what is "clean" to me is not clean to others. For example I find Regex and IL perfectly readable. However, I know most don't and so I'm slowly replacing code using Reflection.Emit to use Linq.Expressions.

dajuric commented 8 years ago

Thanks for the reply and tests.

UnmanagedMemory is your class or ? Have you tried with bigger array sizes (this size corresponds to 256x256 image) ?

shmuelie commented 8 years ago

UnmanagedMemory is the class I mentioned above. It's really just a wrapper around LocalAlloc and Marshal.FreeHGlobal.

Ran some more tests:

Size: 1073741824
Unmanaged: 00:00:00.0005554
Managed:      00:00:00.0001652

I then modified the test to do two allocation and then deallocate. Had to do this since .NET was giving me issues with large arrays.

Size (per allocation): int.MaxValue / 2
Unmanaged: 00:00:00.0009885
Managed:      00:00:00.0012521

So very interestingly it seems very dependent on what size you're allocating on if Unmanaged or Managed is faster...

kinchungwong commented 8 years ago

Hi all,

I would like to share some notes I have collected on the issue of imaging platform on .NET. Most of the points in my notes are trivial or common knowledge. Feel free to skim my notes and just look for points which you find interesting.

  1. https://gist.github.com/kinchungwong/155c9046ba3f1fecf49a
  2. http://programmers.stackexchange.com/a/210845

The list of requirements for an imaging platform is very long. Many of the requirements are conflicting, which means they cannot be all satisfied. Thus, by choosing to satisfy some requirements and discarding others, we effectively decide who will be the subset of users (audience) of the imaging platform.

I think it is important to decide if certain goals should be included or excluded early-on.

That said, from my notes you will find that there are two interoperability techniques that we can always rely on as a fallback:

Performance is important but so are many other considerations. Also, performance depends on a lot of factors such as OS and CPU. If we made performance choices too early, we may find that the choices we made were sub-optimal whenever the execution environment or use-cases is changed.

phillip-haydon commented 8 years ago

I don't believe imaging belongs in corefx. Imaging is hugely complex and no minimalist library in core will ever satisfy the requirements of imaging.

my 2c

dajuric commented 8 years ago

It depends what you define by minimalistic. The original proposal seems pretty viable. The thing is there are no portable, common image format, nor IO operations that support it. I would be happy if that is resolved (see the first post).

JimBobSquarePants commented 8 years ago

I've been working very hard to provide both with the new ImageSharp

https://github.com/JimBobSquarePants/ImageSharp

It's all managed code running on CoreFX. There's still a lot to do but I've established a pretty useful set of classes so far.

Update Switched the urls to the new repo.

dajuric commented 8 years ago

@JimBobSquarePants Does it have generic image class ?

JimBobSquarePants commented 8 years ago

There is an ongoing discussion that suggests using a generic image class as a means to keep memory usage low but I struggled to get the implementation correct.

https://github.com/JimBobSquarePants/ImageProcessor/issues/287

The API is still very much in flux so if you have any ideas that would allow us to realise a generic image class I have no qualms at all about using that the way forward.

dajuric commented 8 years ago

OK. One question: what do you find it inappropriate / should be modified in DotImaging ? (because it seems that we are doing the same)

JimBobSquarePants commented 8 years ago

I guess the biggest difference is that you require a lot of interop code in order to work cross-platform whereas I chose to write the encoders/decoders plus basic algorithms as part of the library. This, in my opinion, makes it easier to maintain the library and reduces build/deployment complexity.

On the flip side this means I have to write the encoders/decoders which has been a chore!

dajuric commented 8 years ago

Well, I agree with you only partly. Yes, I need the interop code which enables the framework to read/write images and to enable the access to the camera and read/write videos. (I believe you can only read/write images)

The only interop related for image loading/saving is: CvInvoke.cvLoadImage and CvInvoke.cvSaveImage (https://github.com/dajuric/dot-imaging/blob/master/Source/IO/ImageIO.cs) - I do not think that is a lot.

And I totally agree that the maintenance is easier without involving 3rd party native library. That is one of the main reasons why I proposed this request - I am trying to kick out those native dependencies, but I do not know how to replace them (especially video/camera related).

If DotImaging would enable portable image IO operations would that be suitable enough ?

kinchungwong commented 8 years ago

@dajuric

Have you considered existing third party libraries that aggregate multiple image encoding/decoding libraries and already have official .NET wrapper? For example, FreeImage.NET

A lot of .NET image encoding/decoding libraries ultimately depend on reference libraries written in C or C++. For example, OpenCV depends on libtiff, libjpeg, libpng, and so on. The wisdom has been to outsource the hard part to a different .NET library that takes care of that.

@JimBobSquarePants

My insight regarding generic image processing algorithms is that:

Although it would be nice from the library user's (API user's) perspective to have the freedom to choose the pixel data precision and at the same time perform high-level image operations using polymorphism (for example, a wrapper function for a blur operation that can accept both a byte image and a float image), at the lowest implementation level (arithmetic and computations), byte images and float images require different low-level algorithm implementations.

For simple cases such as a blur function, intermediate sums of weighted byte pixel values must be stored in either a wider integer or a float. Whereas if the pixel format is float, the same numeric type can be used for intermediate sums.

For more complicated arithmetic operations, the entire arithmetic algorithm must be rearranged specifically for byte pixel types, because not doing so will lead to loss of precision for byte pixel types. This is beyond the realm of generics, and would have required C++ template specializations.

From my experience of implementing low-level image processing algorithms (of a nature similar to those found in OpenCV ImgProc module), the impracticality of "write arithmetic algorithms once, run on any numeric precision types" is a universal issue, regardless of programming languages or availability of template or generics. The consequence is that the blur function will likely have two or more low-level implementations: one for byte and one for float. More complicated image processing algorithms will have more type-specific implementations. Having this awareness will save one from trying to take on a very difficult or impossible design decision at the low-level.

This explains why the Boost Generic Image Library did not achieve broad adoption.

That said, this doesn't prevent API users from yearning for a public GenericImage Blur(GenericImage input) method. GenericImage takes the role of a handle type that wraps multiple image classes having different numeric formats. The handle type can be polymorphic or generic. The design of this handle type should not dictate the low-level algorithm design decisions, or else it would have made the low-level implementation impossible.

This is in addition to the issue that C# generics do not allow the kind of template specialization found in C++.

dajuric commented 8 years ago

@kinchungwong Yes, I did. But I do not see any advantage replacing the existing underlying OpenCV dll with that - you have the unmanaged platform-specific library + managed wrapper (as it is now).

Do I missing something ?

(in addition I see there are mentions and proposals for image processing functions such as blur, Sobel,...). In my opinion they do not belong to the core + the biggest issue is to have a minimal platform that enables the user the things that DotImaging framework does.

How do you see this ?

kinchungwong commented 8 years ago

I will try to completely rewrite the proposal as follows. (I have other ideas but I will post as separate comments for easier reading.)

Primary: Define one or more image classes in the CoreFX

Definition

Intended interpretation and recommendations

Use cases

Interoperability goals

Secondary: Enable a plugin framework within CoreFX for system-provided and third-party image codecs

lilith commented 8 years ago

I celebrate a common image class. I feel it may be premature to establish a generic codec interface before we catalogue codec implementations themselves. Could the codec framework proposal be split into a separate issue, perhaps? The two overlap in terms of color profile and metadata support, but the former is relatively standardized and the latter is orthogonal and should not be stored with bitmap frames anyway, if we learn from the mistakes of GDI+.

Codecs have far more differences than similarities, so it is crucial to reason about codec frameworks as high-level utility, and avoid obscuring or otherwise making access to the underlying implementations difficult. Developers should be able to exercise control over which codecs are installed for the app, and which are loaded for the code path. If possible, avoid thinking about this as a true abstraction; different image types are similar only for the most trivial use cases.

Generic codec interfaces are often the performance bottleneck

(Which is why FreeImage is not useful if you value speed).

I'd like to mention that a generic codec interface tends to bring all codecs down to the same level. For example, take jpeg decoding. If one is downscaling a jpeg, the majority of that downscaling should be done at the block-level for a 500% or more speedup and fractional ram usage. Unfortunately, that ties to a particular codec interface, like libjpeg 6 (of which there are a dozen impls.). Yet, fast jpeg scaling is a key need in web apps, so sacrificing so much performance may not be acceptable.

In the context of web apps, security is paramount, followed by quality and speed. If users are not waiting on a result, then speed can be sacrificed, but in all the CMSes I'm aware of, the user is blocked by the speed of backend image processing. A minority of scenarios utilize queue/asynchronous asset processing.

MEASURE AN E2E SOLUTION WHICH USES YOUR INTERFACES

  1. Take out your Galaxy S7, LG G4, or iPhone, and snap a picture. How long does your software take to decode, scale, and encode that image for web usage? Are you okay with HTTP requests taking that long?
  2. Try scaling problematic images (sharp lines, text, distinct highlights/shadows). Are there visible artefacts? If so, marketing is going to summarily reject business products built around it. Clear photos are strongly correlated with sales.

The above steps don't take much time, but library authors avoid them. These are simple, real-world litmus tests that should inform architecture and interface design, because if the interface prevents these tests from passing, your interface has failed. Most interfaces fail these tests. Don't repeat ancient mistakes!

For the record, it's quite possible to achieve test 1 in < 200ms on a single core, with superior quality. Test no. 2 can be difficult if you're not used to identifying artefacts. But it's actually pretty easy to do this in an objective fashion given a baseline.

  1. Grab photos from unsplash.com or from compression test image suites. I like this image and this image for testing if your software uses correct color math when scaling, since it's easy to see. This image is nice for seeing sharp lines. And, as always, try the Jähne test pattern. I add a border to check for off-by-one scaling errors. If no red border remains, your resampling algorithm has incorrect math.
  2. Generate a reference scaling of the image with an custom HDRI-compiled version of ImageMagck 6.9 or higher. convert [input image path] -set colorspace sRGB -colorspace RGB -filter Robidoux -resize 400x260 -colorspace sRGB [output PNG path] Verify convert -v lists HDRI, or you will be causing color banding in your reference image. You can't scale images with less than 14 bits per channel or you're losing multiple bits of precision in the 8-bit sRGB space.
  3. Compare your results with dssim and visually with compare [reference png] [result png] -fuzz 1% x:

Many people err when evaluating quality since we tend to equate sharpness with quality. Unfortunately, bad math tends to make most images sharper (but destroys others), so incorrect implementations abound. Certain terrible approximations (like mip-mapping and fixed-window scaling) are common in game frameworks and computer vision, yet it is possible to find images that look acceptable when processed by these algorithms at certain scaling ratios.

I sometimes think the problem domain is sliced along the wrong axis. If I'm doing computer vision, I want to use OpenCV directly to take advantage of the latest features. Abstraction layers usually mean outdated docs and feature subsets. If I'm doing image asset pipeline work, I want speed, quality, and a remarkably basic feature set. Abstractions that drop features (for CV) or speed/quality (for pipelines) aren't improving things for me.

Interfaces that improve interoperability between libraries are extremely valuable. Interfaces that try to invent a 'generic' use case are, in reality, targeting an even smaller utility niche. Thus while I love common bitmap frame representation and even color profile storage, I question how much emphasis should be placed on a common codec framework - and if it even belongs in corefx.

Question: WIC has been the core codec framework for Windows for a decade, right? How many third-party codecs are there? Should we anticipate .NET Core being more popular than Windows?

shmuelie commented 8 years ago

Should we anticipate .NET Core being more popular than Windows?

It's not if .NET Core will be more popular than Windows but how popular with .NET Core be on none Windows systems

lilith commented 8 years ago

@SamuelEnglard My point is that third-party codecs to any given codec framework are not particularly abundant. Maybe we should not build a codec framework into .NET Core this early unless it's a separate nuget package.

phillip-haydon commented 8 years ago

I personally don't think imaging should be built into core. The vast majority of users of core will never need imaging. If MS want to supply a base library for imaging separate build on top of Core, that 3rd parties consume for their own libraries as a dependency. I'm all for that. But it makes no sense to be in Core for me.

my 2c

shmuelie commented 8 years ago

@nathanaeljones That I agree with.

@phillip-haydon Remember that core isn't a monolith anymore, just because somethings in core doesn't mean your app has to have it in it

lilith commented 8 years ago

So, I've been working on building the ideal OSS imaging library for modern web use, with all the details (local error management, re-entrancy, no shared state, stacktraces in release mode, full mem mgmt control) needed to make it play well with Core CLR.

Today I launched it on Kickstarter: https://www.kickstarter.com/projects/njones/imageflow-respect-the-pixels-a-secure-alt-to-image

alexperovich commented 7 years ago

Can someone make a formal api proposal?

JeremyKuhne commented 1 year ago

As System.Drawing.Common is a GDI/GDI+ wrapper and only supported on Windows now, this feature request is out of scope. Please see our recommendations for other libraries to consider.