Poor performance when drawing 900 lines in 3d

timholy commented 6 months ago

[X] are you running newest version (version from docs) ?
[X] can you reproduce the bug with a fresh environment ? (]activate --temp; add Makie)
[X] What platform + GPU are you on?

System summary:

OS Name Microsoft Windows 11 Pro
Version 10.0.22631 Build 22631
Other OS Description    Not Available
OS Manufacturer Microsoft Corporation
System Name DIVA
System Manufacturer LENOVO
System Model    82V4
System Type x64-based PC
System SKU  LENOVO_MT_82V4_BU_idea_FM_Slim 7 Carbon 13IAP7
Processor   12th Gen Intel(R) Core(TM) i7-1260P, 2100 Mhz, 12 Core(s), 16 Logical Processor(s)
BIOS Version/Date   LENOVO K2CN27WW, 4/29/2022
SMBIOS Version  3.3
Embedded Controller Version 1.27
BIOS Mode   UEFI
BaseBoard Manufacturer  LENOVO
BaseBoard Product   LNVNB161216
BaseBoard Version   SDK0T76530 WIN
Platform Role   Mobile
Secure Boot State   On
PCR7 Configuration  Elevation Required to View
Windows Directory   C:\WINDOWS
System Directory    C:\WINDOWS\system32
Boot Device \Device\HarddiskVolume1
Locale  United States
Hardware Abstraction Layer  Version = "10.0.22621.2506"
User Name   diva\timho
Time Zone   Central Daylight Time
Installed Physical Memory (RAM) 16.0 GB
Total Physical Memory   15.7 GB
Available Physical Memory   4.17 GB
Total Virtual Memory    34.6 GB
Available Virtual Memory    15.8 GB
Page File Space 18.9 GB
Page File   C:\pagefile.sys
Kernel DMA Protection   On
Virtualization-based security   Running
Virtualization-based security Required Security Properties  
Virtualization-based security Available Security Properties Base Virtualization Support, Secure Boot, DMA Protection, UEFI Code Readonly, SMM Security Mitigations 1.0, Mode Based Execution Control, APIC Virtualization
Virtualization-based security Services Configured   Hypervisor enforced Code Integrity
Virtualization-based security Services Running  Credential Guard, Hypervisor enforced Code Integrity
Windows Defender Application Control policy Enforced
Windows Defender Application Control user mode policy   Off
Device Encryption Support   Elevation Required to View
A hypervisor has been detected. Features required for Hyper-V will not be displayed.

Display:

Name    Intel(R) Iris(R) Xe Graphics
PNP Device ID   PCI\VEN_8086&DEV_46A6&SUBSYS_380D17AA&REV_0C\3&11583659&1&10
Adapter Type    Intel(R) Iris(R) Xe Graphics Family, Intel Corporation compatible
Adapter Description Intel(R) Iris(R) Xe Graphics
Adapter RAM 1.00 GB (1,073,741,824 bytes)
Installed Drivers   <>,C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_8f8d88151a287d5a\igd10iumd64.dll,C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_8f8d88151a287d5a\igd10iumd64.dll,C:\WINDOWS\System32\DriverStore\FileRepository\iigd_dch.inf_amd64_8f8d88151a287d5a\igd12umd64.dll
Driver Version  30.0.101.1660
INF File    oem194.inf (iADLPD_w10_DS section)
Color Planes    Not Available
Color Table Entries 4294967296
Resolution  2560 x 1600 x 60 hertz
Bits/Pixel  32
Memory Address  0x1C000000-0x1CFFFFFF
Memory Address  0x0000-0xFFFFFFF
I/O Port    0x00003000-0x0000303F
IRQ Channel IRQ 4294967239
Driver  C:\WINDOWS\SYSTEM32\DRIVERSTORE\FILEREPOSITORY\IIGD_DCH.INF_AMD64_8F8D88151A287D5A\IGDKMDN64.SYS (30.0.101.1660, 42.79 MB (44,864,080 bytes), 7/18/2022 12:46 PM)

Note: using through WSL2!

Package versions

GLMakie v0.10.2
Makie v0.21.2
MakieCore v0.8.2
ModernGL v1.1.7

The actual demo

Starting from an empty folder, activate a new project and add the following packages:

tim@diva:~/tmp/RecipeDemo$ cat Project.toml
[deps]
Colors = "5ae59095-9a9b-59fe-a467-6f913c188581"
GLMakie = "e9467ef8-e4e7-5192-8a1a-b1aee30e663a"
GeometryBasics = "5c1252a2-5f33-56bf-86c9-59e7332b4326"
MakieCore = "20f20a25-4f0e-4fdf-b5d1-57303727442b"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"

Then grab the code from this gist and save to the same folder.

Here's an annotated interactive session:

julia> using GLMakie

julia> include("RecipeDemo.jl"); using .RecipeDemo

julia> mgmm = IsotropicMultiGMM(Dict{Symbol,IsotropicGMM{3,Float64}}());

julia> mgmm.gmms[:Hydrophobe] = IsotropicGMM([IsotropicGaussian{3}(randn(3), 0.5, 1.0) for _ = 1:50]);

julia> mgmm.gmms[:Donor] = IsotropicGMM([IsotropicGaussian{3}(randn(3), 0.5, 1.0) for _ = 1:50]);

julia> fig = Figure();

julia> ax = Axis3(fig[1, 1]);

julia> gmmdisplay!(ax, mgmm)
Plot{Main.RecipeDemo.gmmdisplay, Tuple{IsotropicMultiGMM{3, Float64, Symbol}}}

julia> RecipeDemo.counter   # there were 900 lines drawn
Base.RefValue{Int64}(900)

julia> fig

julia> empty!(ax)
Axis3()

julia> @time gmmdisplay!(ax, mgmm)   # this is quite slow (but only ~30ms if the figure isn't showing)
  4.603401 seconds (2.28 M allocations: 281.923 MiB, 2.12% gc time, 4.02% compilation time)
Plot{Main.RecipeDemo.gmmdisplay, Tuple{IsotropicMultiGMM{3, Float64, Symbol}}}

julia> empty!(ax)
Axis3()

julia> @time nonrecipe!(ax, mgmm)    # is it something to do with the recipe infrastructure?
D3D12: Removing Device.
terminate called after throwing an instance of 'St9bad_alloc'
  what():  std::bad_alloc

[30545] signal (6.-6): Aborted
in expression starting at none:0

[30545] signal (11.1): Segmentation fault
in expression starting at none:0
Segmentation fault

Profiling gmmdisplay!(ax, mgmm) reveals the bottleneck is this line.

timholy commented 6 months ago

Also note, gmmdisplay!(ax, mgmm; display=:solid) is much faster:

julia> @time gmmdisplay!(ax, mgmm; display=:solid)
  0.641004 seconds (294.02 k allocations: 42.550 MiB, 2.01% gc time)
Plot{Main.RecipeDemo.gmmdisplay, Tuple{IsotropicMultiGMM{3, Float64, Symbol}}}

when the figure is showing.

asinghvi17 commented 6 months ago

Performance is a lot faster when display=:solid when panning around in Axis3. The :solid method also has only 100 plots as opposed to 900...that might contribute to the issue?

timholy commented 6 months ago

Hmm, this was slower than I expected too:

julia> using GLMakie

julia> fig = Figure();

julia> ax = Axis(fig[1, 1]);

julia> for _ = 1:900
           lines!(ax, randn(32))
       end

julia> fig

julia> empty!(ax)
Axis with 0 plots:

julia> @time for _ = 1:900
           lines!(ax, randn(32))
       end
  3.930636 seconds (6.22 M allocations: 398.483 MiB, 2.12% gc time)

So perhaps it's not a 3d-specific thing?

For comparison:

julia> @time begin
           x = Float64[]
           y = Float64[]
           for _ = 1:900
               append!(x, 1:32)
               push!(x, NaN)
               append!(y, randn(32))
               push!(y, NaN)
           end
           lines!(ax, x, y)
       end
  0.413700 seconds (20.52 k allocations: 3.748 MiB, 7.54% compilation time)
Lines{Tuple{Vector{Point{2, Float32}}}}

Presumably there's a per-transfer overhead to the GPU memory? So bundling things together and doing the transfer in a lump is more efficient than doing them for each plotted object? Of course that's only possible when all the objects are of the same type.

asinghvi17 commented 6 months ago

If a shader is being compiled per plot, then I could imagine this being the result...the NaN-separation approach is definitely better here!

SimonDanisch commented 6 months ago

There's a pretty large overhead for each plot, not because of compilation though... You should be able to give most attributes per point, so it should not be too hard to vectorize, I hope!

asinghvi17 commented 6 months ago

Is that overhead once per display or constant? Because there's noticeably less lag when panning in LScene if the number of plots is decreased.

SimonDanisch commented 6 months ago

It's drawing and plot creation;) Might also run into lscenecamera specific performance problems with many plots

ffreyer commented 3 months ago

Performance probably suffers here because:

each lines plot becomes one renderobject, one draw call and having lots of small draw calls is slow
there is some overhead to creating a plot and submitting it to the backend
with an open Axis (etc) adding a plot triggers limit recalculation which effectively calls boundingbox.(scene.plots). Could be improved by caching bboxes, ref #4240

Should we close this as a situation where performance degradation is expected? Or keep it around as something that caching boundingboxes may solve?

timholy commented 3 months ago

(I'm fine with whatever the devs think best.)

MakieOrg / Makie.jl

Poor performance when drawing 900 lines in 3d #3899

Package versions

The actual demo