AvaloniaUI / Avalonia

Develop Desktop, Embedded, Mobile and WebAssembly apps with C# and XAML. The most popular .NET UI client technology
https://avaloniaui.net
MIT License
25.82k stars 2.23k forks source link

Performance issue with image #3043

Closed ShahroozAnsari closed 4 years ago

ShahroozAnsari commented 5 years ago

While i load many image with image control i lose my frame rate and app work hardly. when i UseDirect2D1app get 2Gb of ram for 50 image. how i solve this problem? each image is about 100 kb

Gillibald commented 5 years ago

When working with images you should use a caching mechanism that makes sure you don't exceed certain limits. Can you share some code how you are loading images etc?

ShahroozAnsari commented 5 years ago

I have Controller Like this:

`<UserControl xmlns="https://github.com/avaloniaui" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:control ="clr-namespace:TKD.Controls;assembly=TKD" mc:Ignorable="d" d:DesignWidth="200" d:DesignHeight="200" x:Class="TKD.Pages.Games.TopicItem" Background="Transparent" Width="200" Height="200" Margin="15">

  <Image Name="TitileImage" Width="100" Height="100" VerticalAlignment="Center" HorizontalAlignment="Center"/>
  <control:CircularProgressBar Name="WaiteImage"
                                HorizontalContentAlignment="Center"
                                VerticalAlignment="Center" Margin="60"/>
  <TextBlock Text="Title" Name="txtTitle"  FontSize="15" Foreground="Green"  VerticalAlignment="Bottom" HorizontalAlignment="Center" Margin="2,0,0,10"/>
</Grid>   

`

and backend Code is like this : `using Avalonia.Controls; using Avalonia.Markup.Xaml; using Avalonia.Media; using Avalonia.Threading; using System; using System.Threading.Tasks; using Avalonia.Media.Imaging; using SkiaSharp; using TKD.Controls; using TKD.Helper; using TKD.Infrastructure.API;

namespace TKD.Pages.Games { public class TopicItem : UserControl { private readonly int topicId; private readonly bool isLock = false; public TopicItem(TopicsDto topic) { topicId = topic.Id.Value; this.InitializeComponent(); this.PointerPressed += TopicItem_PointerPressed; var title = this.Find("txtTitle"); var CountBoard = this.Find("CountBoard"); var lockImage = this.Find("lockImage"); var WordCounts = this.Find("WordCounts"); var wordImage = this.Find("TitileImage"); var WaiteImage = this.Find("WaiteImage");

        title.Text = topic.Title;
        var totalWords = topic.TotalWord;

        var userLearend = topic.UserLearnedWord;

        WordCounts.Text = userLearend.ToString() + "/" + totalWords.ToString();
        if (totalWords == 0)
        {
            isLock = true;
            lockImage.IsVisible = true;
            CountBoard.IsVisible = false;
            var converter = new BrushConverter();
            var brush = (Brush)converter.ConvertFromString("#AB9DE0");
            title.Foreground = brush;
        }
        WaiteImage.Start();
        Task.Run(() =>
             {
                 try
                 {
                     var image = GetImageHandler.GetImage(topic.ImageUrl, topic.Title);

                     Dispatcher.UIThread.InvokeAsync(() =>
                     {

                         wordImage.Source =  image;

                        WaiteImage.Stop();
                    });

                 }
                 catch (Exception e)
                 {
                     Console.WriteLine(e);

                 }
             });

    }

    private void TopicItem_PointerPressed(object sender, Avalonia.Input.PointerPressedEventArgs e)
    {
        if (!isLock)
        {
            Program.MainWindowPresenter.Content = new Game(topicId);
        }
    }

    private void InitializeComponent()
    {
        AvaloniaXamlLoader.Load(this);
    }
}

} `

how I must Use chaching System dou you have sample?

mstr2 commented 5 years ago

What's the resolution of your images? A footprint of 2 GB would suggest something on the order of 10 MPx for each of the 50 images.

In addition to that, Avalaonia is currently consuming twice the required memory for images on unified memory systems when using GPU-accelerated rendering.

ShahroozAnsari commented 5 years ago

@mstr2 size of each image is 100 kb and GPU-accelerated config i get difference errors

ahopper commented 5 years ago

@ShahroozAnsari you never answered my question on gitter, Is 100kb the compressed file size, if so what is the uncompressed size(as that's what ends up in memory)? Maybe if you uploaded an image or preferably a fully working repro project someone might have a look at it.

skyflyer commented 4 years ago

@ahopper,

I don't have any experience with Avalonia, just trying it out for an idea I'm having. I expect that I will need to show thousands of images and I have created a simple test where I added 5000 images (150kB in size) wrapped in a simple Panel in a container WrapPanel inside a ScrollViewer:

        <ScrollViewer Grid.Column="2" HorizontalScrollBarVisibility="Disabled" VerticalScrollBarVisibility="Auto">
            <WrapPanel Background="LightBlue" x:Name="WrapPanel" >
                <Panel Background="White">
                    <Image Source="file:///sompath/image.jpg" 
                        Width="300" 
                        Height="200" 
                        Margin="5,5"
                        />
                </Panel>
            </WrapPanel>
        </ScrollViewer>

I added the panels (and images) programatically within MainWindow.cs for a test and the images show up, but the scrolling and resizing is quite slow. I even tested with Panel controls without the children, and the scrolling is slow as well. The performance is acceptable with 500 elements (even 1000), but is quite slow with 5000 or 10000 child controls.

What would be the best approach to tackle such a challenge? Is there some mechanism (UI virtualization & view recycling) of reusing controls which are not visible (similar to Android and iOS) and deferring loading of the components until they're visible inside the view?

The ScrollViewer is not an "items control" so it probably does not support virtualization, right? But the inner WrapPanel could be, though I can't find a WrapPanel control that would support virtualization. Also, there is a property on the ScrollViewer which should help with performance ( IsDeferredScrollingEnabled), but that one is not available in Avalonia (at least not that I could find it).

An informative article on this topic: https://docs.microsoft.com/en-us/dotnet/framework/wpf/advanced/optimizing-performance-controls

I would appreciate a pointer which direction you think might be best for such a solution.

danwalmsley commented 4 years ago

@skyflyer in your case you would use something like ListBox (with your custom panel) or ItemsRepeater these support virtualization which means it doesnt end up rendering all 5000 images when they are not displayed on the screen.

Virtualization is on by default btw. ItemsRepeater has best performance but is for now limited if you want to support selection or anything like that.

Listbox is going to be upgraded in an upcoming release to be based off of ItemsRepeater, giving you the advantage of virtualization now,,, and in future release even better performance.

skyflyer commented 4 years ago

@danwalmsley, thanks for the hint! By using ListBox, I am able to effortlessly scroll 15000 items. There are drawbacks, of course (I need a grid and I'd like to achieve smooth scrolling), but that can be overcome by using a custom control. Do you happen to know if such a control exists already?

Like a virtualizing WrapPanel

grokys commented 4 years ago

@skyflyer there's currently not a virtualizing WrapPanel, but you could use ItemsRepeater instead of ListBox. It's ported from UWP/WinUI and there's a UniformGridLayout.

Here's some docs:

http://avaloniaui.net/docs/controls/itemsrepeater https://docs.microsoft.com/en-us/windows/uwp/design/controls-and-patterns/items-repeater https://docs.microsoft.com/en-us/uwp/api/microsoft.ui.xaml.controls.itemsrepeater?view=winui-2.3

skyflyer commented 4 years ago

@grokys, thanks! I tested with ItemsRepeater and it is working (/wrt virtualization and recycling), but it needs work in order to get to "grid" view - which is where UniformGridLayout helps a lot! I also tested UniformGrid with ItemsControl and that did not use virtualization (from a quick glimpse into the source code, UniformGrid does not implement virtualized scrolling).

I appreciate your input, thanks.

To recap: I got the virtualization working but the scrolling stuttered with more than 4 images in a row. I then found out that Avalonia support async converters with the ^ modifier and modified the binding to <Image Source={Binding BitmapAsync^} /> which returns the bitmap asynchronously, and with 15000 images loaded, the scrolling (on mac) is super smooth! The final proof-of-concept uses ItemsRepeater with UniformGridLayout and async bitmap loading.

An interesting thing to note is that if I use an async property without artificial Task.Delay(100) in it, the scrolling still stutters, as it appears that the framework actually does not call the method asynchronously in reality.

I wonder why IsAsync is not supported on a binding though, but I digress.

Even though I'm not the one who opened this issue, it is clear, that virtualization really helps here and I think this issue can be closed.

skyflyer commented 4 years ago

@danwalmsley, @grokys, one thing I noticed, while running this on a bigger retina screen is that when I have more than 50 boxes on screen, the scrolling is unusable, whereas if I make the boxes bigger (thus having only 20 boxes on screen), the scrolling is smooth. It seems as though the async loading stops "working", at least it is not visible for the images. If I remove the Source from the image binding, it is all buttery smooth. I can share a repro, if you'd like to see for yourself.

Is there a limit for virtualization that I should be aware? Like a size of the child items?

danwalmsley commented 4 years ago

@skyflyer iv also got this happening... looking into it

skyflyer commented 4 years ago

@danwalmsley, thanks! I just got around to it after several days of other work and I'm setting up a profiler for .NET Core on Mac as I write this, so I can see what's going on. From a brief inspection of the code I "assume" it is the Image control's Render method. I need to setup profiler first, so I can measure things and try and figure it out. BTW, is async Render supported?

skyflyer commented 4 years ago

Here is a screenshot from a profiler. My suspicion seems to be correct, as most of the time is being spent in DrawImage.

Screenshot 2020-05-05 at 17 46 55
kekekeks commented 4 years ago

Most likely Skia is busy converting the pixel format before uploading images to the video memory. The budget for bitmaps that are cached is kinda low (<100MB, I think), so it keeps reuploading them on each frame.

skyflyer commented 4 years ago

I see two approaches here: one is to optimize/speedup rendering, but this just "postpones" the real issue until there's more stuff (more images on the screen) to do when rendering. The other approach is to get it off the main UI thread, so that the user interface remains responsive. I have no idea if it is possible and what it would take though. FWIW, this is on 5K screen, so there's a lot of pixels to push around.

kekekeks commented 4 years ago

@skyflyer It's already off the main UI thread. The issue happens when SkCanvas.DrawImage attempts to draw a bitmap that is stored in the main memory to a hardware accelerated OpenGL context.

It takes time to upload bitmaps to the GPU memory and costs said memory to keep them there. Internally Skia uses an LRU cache for such bitmaps, but once you have more onscreen bitmap data than fits in the cache, Skia will keep deleting and reuploading bitmaps. Hence the observed slowdown.

Gillibald commented 4 years ago

Images can be transferred to the GPU without holding a copy. https://github.com/google/skia/blob/chrome/m68/include/core/SkImage.h#L745

Gillibald commented 4 years ago

Also, see this for reference: https://github.com/AvaloniaUI/Avalonia/pull/2858

makeTextureImage returns null without GPU backend. For RTB we have to call makeRasterImage before using it in case isTextureBacked return true.

kekekeks commented 4 years ago

@Gillibald It raises additional problems of consuming way too much GPU memory, however when using OpenGL it should be managed by OpenGL driver.

Another problem would be RenderTargetBitmap that is rendered on the UI thread and currently needs bitmap resources to be backed CPU memory.

So we'll need a way to download images back to the CPU memory. Which doesn't play well with GrContext and friends being not thread safe.

So we'll have to manually upload images to OpenGL textures and then use context sharing to safely download them back.

mstr2 commented 4 years ago

It raises additional problems of consuming way too much GPU memory, however when using OpenGL it should be managed by OpenGL driver.

I think there is no fixed number you could conceivably pick as the "right" amount of VRAM usage. Not that it should matter much these days, almost all mobile systems use unified memory anyway. Probably the most sensible policy for VRAM is the same as for RAM: let an application consume as much as it wants to consume.

kekekeks commented 4 years ago

For RTB we have to call makeRasterImage before using it in case isTextureBacked return true.

We can't access the same GrBackend from different threads simultanously.

Gillibald commented 4 years ago

Just guard it via checks for the render thread and when we call this from non render thread and the image is texture backed create a new platform implementation (reload the data)

kekekeks commented 4 years ago

I think there is no fixed number you could conceivably pick as the "right" amount of VRAM usage.

Yes, but our problem is that we'll keep the bitmap in both CPU and GPU memory.

Gillibald commented 4 years ago

Transfer should happen when the platform implementation is constructed and the initial raster image should be disposed.

kekekeks commented 4 years ago

@Gillibald attempting to use locking for thread-safe access to GrContext will lead to UI thread blocking waiting for the render thread to complete the frame. Which kinda defeats the whole purpose of UI and render thread separation.

mstr2 commented 4 years ago

Yes, but our problem is that we'll keep the bitmap in both CPU and GPU memory.

For non-RTB, the CPU memory copy could potentially be discarded immediately after upload.

Gillibald commented 4 years ago

We are talking about fixing RTB when used on the UIThread when is that not blocking?

kekekeks commented 4 years ago

@mstr2 I'm talking about images that are rendered to RTBs. i. e.

UI thread will have to wait for the render thread to stop rendering and then will block the render thread for the time of downloading the texture. With multiple such images being rendering it can lead to a lockfest.

mstr2 commented 4 years ago

I agree that this scenario would probably require multiple GL contexts that share the texture to prevent going back and forth between CPU and GPU.

Gillibald commented 4 years ago

I suggest changing RTB to work as a recorder of drawing operations and replay them on the actual render. That way everything run on the render thread and RTB has hardware acceleration.

kekekeks commented 4 years ago

1) It will cause atrocious memory consumption in certain scenarios 2) The Save method will still need to materialize it

kekekeks commented 4 years ago

@mstr2

I agree that this scenario would probably require multiple GL contexts that share the texture to prevent going back and forth between CPU and GPU

Yes, using multiple GrContext's backed by OpenGL contexts from the same share list and uploading textures manually will fix the issue.

However there are graphics APIs other than OpenGL. That should also be considered before taking that approach. E. g. @MarchingCube's app is running on top of Vulkan and I'm not quite sure how well it would work with most of the Vulkan drivers that provide a limited number of command lists.

Gillibald commented 4 years ago

Save just replays the content with a raster surface. The memory consumption should not matter because we are just holding primitives. Also the SceneGraph works the same and is even allocating on each update.

The recording isn't touching PlatformImpl at any time and therefore there is no threading issue.

kekekeks commented 4 years ago
var rtb1 = new RenderTargetBitmap(new PixelSize(16, 16));
var rtb2 = new RenderTargetBitmap(new PixelSize(2048, 16));
using (var ctx = rtb1.CreateDrawingContext())
{
     for(var c=0; c<9002; c++ )
     {
           using(var ctx2 = rtb2.CreateDrawingContext())
           {
                 ctx2.DrawSomething();
           }
           ctx.DrawImage(rtb2
     }
}

With ths code you'll get tons of allocations out of nowhere

Save just replays the content with a raster surface.

It still needs to access GPU-backed SKImage on the UI thread. Which require a lock.

Gillibald commented 4 years ago

Implement IImage with RTB and it should work. Content is replayed on Draw.

kekekeks commented 4 years ago

@Gillibald it wont solve either allocations or locking problem.

MarchingCube commented 4 years ago

@mstr2

I agree that this scenario would probably require multiple GL contexts that share the texture to prevent going back and forth between CPU and GPU

Yes, using multiple GrContext's backed by OpenGL contexts from the same share list and uploading textures manually will fix the issue.

However there are graphics APIs other than OpenGL. That should also be considered before taking that approach. E. g. @MarchingCube's app is running on top of Vulkan and I'm not quite sure how well it would work with most of the Vulkan drivers that provide a limited number of command lists.

Yeah, it is actually fairly complicated on Vulkan since you have many options like using multiple device queues (which is not supported by many drivers like on AMD cards) or using multiple devices (which requires cross device synchronization and state transitions). And if we manually manage textures then on Vulkan we also have to correctly transition images into correct states and layouts before giving it to Skia.

And about recording commands - I think most of the browsers just record and replay command lists now, it rarely directly draws to the HW accelerated surface (since you can also optimize draw calls before submitting, although for both OpenGL and Vulkan Skia will batch submissions as well to reduce overhead).

Gillibald commented 4 years ago

I will think about how to make nested RTB possible. Probably some special handling for IImage when used in the context of another RTB (nesting). Allocations are no issue because we have to allocate somewhere. The SceneGraph would allocate the same amount.

kekekeks commented 4 years ago

Allocations are no issue because we have to allocate somewhere

No, we don't. Right now with RTB we are drawing in immediate mode, all changes are applied to the target bitmap immediately.

Gillibald commented 4 years ago

So a bitmap consumes less memory than a simple struct that holds some shape? Worst case you store multiple bitmaps that are being added to RTB's content. Usually you add some primitives like rects etc. Normally this consumes less memory.

Will stop here and think about your concerns.

danwalmsley commented 4 years ago

@mstr2 @skyflyer @ShahroozAnsari Iv added an apu that allows you to set the Skia Texture limit size for video memory... by default its only 100mb and its whats causing your issue, because skia is falling back to software rendering.

once this is merged, and we will release a 0.9 version with this in, you will be able to do:

.With(new SkiaOptions
{
     MaxGpuResourceSizeBytes = 1024000000
})

forexample which allows upto 1gb of resources in video memory...

remember its not the size of your original images (the jpegs) its the decoded size, so 4bytes per pixel in the image, so 1gb is probably a reasonable start.

We have also increased the default limit to 512mb...

so it may work without doing anything...

btw you will need to be using skia, and you should have deffered renderer enabled.

danwalmsley commented 4 years ago

@mstr2 @skyflyer @ShahroozAnsari ok so the above is merged to master, however its really just a workaround...

to get truly smooth scrolling you need to resize your images, so that your not overloading gpu memory... i.e. if you jpeg image source is 10k x 10k pixels that will end up on the gpu... you may only want to display an image of 250x250...

so we are adding a Bitmap decode api, where it will load it at a smaller size. https://github.com/AvaloniaUI/Avalonia/pull/3890

iv tried it, and resizing you get butter smooth scrolling, combine it with ItemsRepeater and UniformGridLayout and you have perfect solution.

danwalmsley commented 4 years ago

https://github.com/jmacato/Symphony/blob/b581564d6a3bf3b96a31af631a44b3b70a76f6a4/src/Views/CollectionExplorerView.xaml#L9

shows you how to get the layout you want... just relies on you passing sensible sized images to get max performance.

skyflyer commented 4 years ago

@danwalmsley and others, thanks a lot for the effort. I'll try it in the next couple of days and report back.

skyflyer commented 4 years ago

I really appreciate the effort all you have put into this issue! Thanks!

I'm having trouble building Avalonia from sources on mac (avalonia-native.h not being found and then com.h (after I manually copied avalonia-native.h to src/Avalonia.Native/obj/Debug/netstandard2.0/SharpGen/) and ... then std-def.h at last, which I could not find. But, I digress. (I'll open up an issue if this is something you want to support at all)

I tried to test the new stuff (saw the released version 0.9.10 on Nuget, but that does not contain the new bits in SkiaOptions. Which version should I wait for?

skyflyer commented 4 years ago

@danwalmsley, just to let you know, that the fixes work "as advertised" and scrolling works really well.

Another question I have, related to virtualization and view recycling is this: is is somehow possible to use a "predefined" view for recycling and then update it asynchronously? What I mean by that is defining an "empty" placeholder for a view to be used in by the "virtualization logic", so that the "old data" present in an existing view wouldn't be shown?