Open ElliotB256 opened 3 years ago
Performance comparison and feature breakdown: https://amethyst.rs/posts/legion-ecs-v0.3
The Inner Data Parallelism looks much nicer in legion than specs
Bevy also looks pretty good - I'm going to keep an eye on it while the dust settles. We can probably move sometime in a few months.
@MauriceZeuner also might be interesting if you are bored. I'm particularly interested to see how each ECS crate compiles to SIMD instruction sets.
I benchmarked a few other libraries for a simple example here: https://github.com/ElliotB256/bevy_bench/issues/2
specs currently outperforms both legion and bevy for this example - which is a surprise to be honest!
Shelved for now. Can reopen in a few months time depending on benchmarks of Legion and bevy.
I actually started a bevy port now, you can see progress here: https://github.com/TeamAtomECS/AtomECS/tree/bevy
Port is going well, I've done most of the major modules.
Demos can be seen here (in your browser!) https://teamatomecs.github.io/AtomECSDemos/
@MauriceZeuner
Also tagging @YangChemE and @minghuaw , who both may be interested.
I'll finish porting code in that branch before making performance measurements. Some preliminary ones showed the bevy code running twice as fast as the specs code, which surprised me because my benchmarks earlier showed specs was faster. The bevy code is much more compact, it's a significantly nicer API. I am still keeping this as experimental for now. I expect bevy performance will improve as it has very active development.
This is just a general comment. It seems like bevy
has got its own set of shapes. I am wondering whether the port should switch to bevy
's shape definitions or stick with the current ones.
It's a good point! I will check this when back from holiday. We would still want to define our own Shape trait, but possibly implement the trait on these primitives. We probably wouldn't want to define it for all shapes though; bevy's are rendering primitives and so aren't mathematically pure shapes, e.g. the icosphere versus a mathematically defined sphere. Thanks for the suggestion!
What do you think of bevy's syntax compared to specs? I still have yet to compare performance.
I made a comparison of some benchmark simulations.
Bevy currently performs about 37% slower than the specs branch. This is roughly in agreement with the results from https://github.com/bevyengine/bevy/issues/2173 for fine-grained systems, where bevy takes ~130us compared to 108us for specs. I expect the game will close as bevy development continues.
Bevy 0.8 came out a few days ago so I've ported the bevy branch to it. I see a slight performance improvement on the benchmark:
15361ms
15826ms
15414ms
avg=15533ms
13952ms
13660ms
13232ms
avg=13614ms
Time delta 87% / 14 % faster
Re-running bevy 0.8 on my home machine:
Simulation loop completed in 28085 ms.
Simulation loop completed in 28283 ms.
Simulation loop completed in 31367 ms.
29.2s average
Simulation loop completed in 30880 ms.
Simulation loop completed in 30450 ms.
Simulation loop completed in 35263 ms.
32.2s average
Bevy 0.9 10% longer duration than that of bevy 0.8
Retesting for bevy 0.10 comparisons
Simulation loop completed in 28476 ms.
Simulation loop completed in 28009 ms.
Simulation loop completed in 27892 ms.
Simulation loop completed in 33319 ms.
Simulation loop completed in 30575 ms.
Simulation loop completed in 30756 ms.
Simulation loop completed in 42307 ms.
Simulation loop completed in 40414 ms.
Simulation loop completed in 39600 ms.
version | time 1 | time 2 | time 3 | avg | comparison |
---|---|---|---|---|---|
0.8 | 28476 | 28009 | 27892 | 28126 | +0% |
0.9 | 33319 | 30575 | 30756 | 31550 | +12% |
0.10 | 42307 | 40414 | 39600 | 40773 | +45% |
Let's compare some rough metrics between bevy 0.8 and 0.10:
So it seems 0.10 has a core sitting idle on my machine!
app.add_plugins(DefaultPlugins.set(TaskPoolPlugin {
task_pool_options: TaskPoolOptions::with_num_threads(5),
}));
The above prevents one of the cores being idled by the compute thread, so now 3/4 cores on my machine are working. It also reduces benchmark duration to 29627ms, which is closer to 0.8/0.9 performance.
After tweaking until I get 4 compute threads, I get:
Simulation loop completed in 23875 ms.
which is faster than bevy 0.8! (I suspect bevy 0.8 never used all cores properly...)
Running the master branch benchmark on my home pc (ie, using specs)
Simulation loop completed in 19532 ms.
So bevy 0.10 is very close to specs in performance, which is good!
One of the nice things in bevy 0.10 is the ability to automatically determine batch sized based on a heuristic. Let's see how it handles: | test | 1st | 2nd | 3rd | avg | change |
---|---|---|---|---|---|---|
fixed batch, size=1024 | 26611 | 24910 | 26738 | 26086 | +0% | |
BatchingStrategy::new() |
24023 | 23671 | 24034 | 23909 | -8.4% |
Need to make sure #79 is fixed on this branch - best way to do this is to first implement the test.
@minghuaw - I saw you recently exploring the rust gpu ecosystem a bit, would you be interested in a GPU-accelerated entity query system? I'm thinking it would dramatically speed up the single-atom systems. We could start a new issue to explore this - it's the next goal after bevy port is complete.
I am indeed looking at general purpose comouting on GPU recently. I haven't found much luck yet, but i'll keep this in mind and see what i can do about it.
The dream for me would be defining components with a #[GPUStorage]
attribute, and then replacing Query
with a kind of GPUQuery
that provides a for-each that gets converted to a GPU kernel program via rust-gpu. I think it's probably not too much work, but I should use the scant time I have right now for getting bevy branch finished. Maybe next month :)
Bevy now up to 0.11, I'll get round to porting and benching soon. I was hoping for some days of free time this summer to finish the port but they haven't arrived yet!
Performance is slightly better with Legion than specs (Amethyst has now dropped specs and is moving to Legion). Consider moving to Legion. https://github.com/amethyst/legion