JuliaPlots / MakieCore.jl

The core recipe functions for Makie.jl - basically Makie light!
MIT License
6 stars 2 forks source link

Entity component system for Makies scene graph #6

Open SimonDanisch opened 3 years ago

SimonDanisch commented 3 years ago

A few people got interested in using an ECS for Makie (E.g. @ffreyer, @louisponet).

I tried to extract some useful comments about pros and cons from the slack thread:

@c42f:

ECS is super useful.  If you'd like to see "the vibe" of what can be done with it at the application level look at my somewhat embarrassing little game https://github.com/c42f/Gameoji#readme
ECS is really great for games.
* It makes the presence of fields (/ attributes / "components") of structs (/ objects / "entities") fully dynamic - they can be added and removed as necessary.
* The behavioral aspects of the application are coded into "systems" which act on collections of components rather than collections of entities. This decoupling makes the behavior of the game world much more consistent by default.
Whether ECS makes sense for Makie, I'm not exactly sure, but I'd say I'm cautiously optimistic.
* The data layout is reasonably efficient and I suspect makes sense for GPUs. Games use it, after all.
* The flexibility of ECS to mix and match components can bring great and surprising flexibility. It's very compositional in that you can get complex and surprising behaviors (N^2 behaviors where N in the number of systems) by mixing components together. 

@sunbubble

I’ve been using entt professionally for about 2 years now (to develop a 3D application which is maybe conceptually similar to a game level designer), it’s wonderful. ECS is an extremely powerful design pattern. Not without its challenges of course. If you have any questions about it, feel free to ask. I’m more than happy to talk about it :sweat_smile:.
In my opinion an ECS design has a few very powerful features, some are performance, others are more about ergonomics:
Under the hood they almost always use a SoA-like data store, this allows for insanely fast iteration, as cache coherency is often very good.
It makes it very easy to prepare a batch of expensive similar calculations, and schedule them to be calculated together before rendering. Thus you not only avoid redundant calculations, when you perform them you often profit from the cache coherence.
Its ergonomics and design are very similar to an in-memory database. Which is very different to a traditional C++ OOP style. This is very nice for refactoring, because you didn’t create objects on the wrong abstractions that forces you to untangle a huge mess.
It’s natural to add attributes to entities, without having them to be members of the type. Very composable.
In my opinion, its biggest challenge is that encapsulation and interface segregation is more difficult to practice. It falls more on the developer to carefully organize the code, while in traditional OOP, it comes mostly out of the box from the start (which then dies in fire because your abstractions were wrong and you need to leak the implementation details). In practice these are mostly issues for a large codebase, maybe irrelevant to a plotting library.
But if you are careful with the previous point, you can further boost your performance using parallelism, from simple parallel versions of algorithms like map and fold, to more sophisticated approaches like task based parallelism (e.g. https://taskflow.github.io in C++ land)

@c42f

A couple of points I particularly agree with: (1) An ECS registry-of-entities is conceptually like a database table (one which happens to be optimized for sparsity) (2) The composability is great and can lead to surprising emergent complexity when components are combined in various ways.
I certainly haven't built anything big enough to see the downsides regarding lack of encapsulation. But this in itself is very Julian... for better or worse we've already got a language with no way to enforce private implementation detail :sweat_smile:

@sunbubble

I theory one can tackle the encapsulation issue by having different “systems” as separate modules, and only exposing the components and functions which should be public. Also, one should avoid cross dependency of systems, because that simply means they’re a single system (same issue with library design). You should be very explicit where you couple different systems, i.e. if components of different systems depend on each other behaviourally. If they’re completely orthogonal it’s less of an issue.

@louisponet created https://github.com/louisponet/Overseer.jl, which seems like a good starting point for an ECS in Julia.

Examples: https://github.com/louisponet/Glimpse.jl/blob/master/examples/boids.jl https://github.com/c42f/Gameoji#readme

sunbubble commented 3 years ago

Hi there! The author of entt also has a bunch of reading material regarding ECS it goes from high level design pattern ideas to the deep underlying data structures.

For a first dive into ECS in general I would start with this first of multiple blog posts on the topic: https://skypjack.github.io/2019-02-14-ecs-baf-part-1/

entt also has a very rich documentation on how to employ it (or ECS more generally): https://entt.docsforge.com/master/entity-component-system/

ECS is fascinating and very powerful. It's very much worth investigating and considering. But it might not be the best solution to any given problem.

ffreyer commented 3 years ago

Since Simon implemented scatter here, let's take that as an example. Here we currently have:

mutable struct Scatter{N}
    positions::Vector{Point{N, Float32}}
    color::TorVector{RGBAf0}
    marker::TorVector{<: Union{Symbol, Char, Type{Circle}}}
    markersize::TorVector{Union{Float32, Vec{N, Float32}}}
    markeroffset::TorVector{Vec{N, Float32}}
    strokecolor::TorVector{RGBAf0}
    strokewidth::TorVector{Float32}
    markerspace::Space
    transform_marker::Bool
    on_update::Observable{Dict{Symbol, Any}}
    on_event::Observable{Tuple{Symbol, Any}}
    camera::Camera
    transformation::Transformation
end

I.e. we have one component that is per scatter element (positions) and a bunch that are TOrVector, i.e either a single instance "inherited" to each element or a collection with one value per element. The shaders in GLMakie are usually implemented to handle either with structures like {{color_type}} color; which get resolved when a plot is transformed to a render object.

In my first attempt at translating scatter I made each element an entity. That should allow some cool things, like adjust scatter element 1317 to be transformed to a scattered polygon, element 503 to change color etc without having to recreate the whole scatter. The question then is how to handle components in a way that doesn't duplicate a ton of data and that arrives at a reasonably optimal shader. Consider

# This should have all components only once and end up with a shader that uses no textures/buffers
scatter(lots_of_points, marker = Pixel())

# It would be cool if this ends up with 3 calls of the above
scatter(lots_of_points, color = rand(1:3))

# This needs to switch to a shader that has colors in a texture/buffer (or maybe even vertexarray)
# but should keep other attributes as singular values
scatter(points, color = rand(RGBA, length(points))

# it would be cool if this worked
#     this should replace a ~ScatterMarker component with a ~MeshElement/Poly component
#     that component should be combined with other such components to draw one merged mesh
entity_pool = scatter(points)
entity_pool.marker[17] = some_polygon
sunbubble commented 3 years ago

@ffreyer please allow me to critique your proposal from an ECS standpoint. It might be that for the purpose of a plotting library it's preferable to an ECS design.

What you currently propose is a "root object" which is a mutable structure of arrays. This is already very good for performance. But you are restricting what fields a scatter object may have, thus is only expandable if you add new fields to the scatter struct.

In an ecs approach (I will use entt terminology because that's what I'm familiar with) you wouldn't necessarily have a data structure called scatter, you probably would have a function though. In entt you have a "root" data structure called the registry, this registry is essentially an in-memory database, and all your state is supposed to live inside the registry. You can use this registry to create entities, which conceptually is nothing more than an index. To each index, you may or may not (also often in batches) assign components, such as Position, Color, Marker, etc... So you can have a scatter function that creates a batch of entities and assigns them the components which are necessary to draw a scatter plot. The interesting thing then is, when you call the scatter function on another set of data, you add these to the same registry. You may then also on top call an arrows function, which as you know consists of a lines and a scatter call; and all the respective entities and components are added to the registry.

Notice that the OpenGL call required to draw the lines is different than the one required to draw the arrow heads, but this one in turn is the same which was needed to draw the first two scatter calls. Now the beauty of ECS, is that now when want to draw your complicated scene, you can easily build your OpenGL draw calls based on the nature of the data required for each one, i.e. all the three scatter calls can be joined in a single draw call, because you can filter the entities by the components they have. You can ask the registry to give you all the positions and all the colours of only the scatter calls. And then you can do a separate lines OpenGL call for the arrows tails. And the neatest thing is that all Positions of all three scatter calls will live contiguously in memory, allowing for very cheap copying to the GPU.

Where ECS shines is obviously in a game engine, because there you want to simulate a bunch of different things but with many overlapping aspects. Because you can associate different behaviours to the existence or absence of certain components on an entity.

So maybe to summarise the approach, different high level API/recipe calls would create a bunch of entities and give them their meaningful attributes. And then the draw calls would looks for the common relevant attributes, agnostic from which recipe they came from.

I hope this is clear enough.

ffreyer commented 3 years ago

The struct is what Simon has here, it was just supposed to be a reference for what components we need.

My current idea of an ECS (which may very wall be bad) is that a scatter plot splits into one entity per element, each of which can have a position, color, offset, transform, ... component.

To each index, you may or may not (also often in batches) assign components

I guess something like scatter should be batched from the start? I.e. we should have a Color and a BatchedColor component depending on whether one calls scatter(points, color = RGBA(...)) or scatter(points, color = rand(RGBA, length(points))) and only one entity per plot call?