Open andrewgazelka opened 2 weeks ago
If we want to avoid ECS, we could use regular Rust structs and impl methods on them. This seems to be what Bamboo, another Minecraft server implementation, does to handle entities. This also gives us more control over everything.
I have almost no experiance in high performance stuff. So I cannot say much about this.
I never used an ECS before. But when working on the Inventory I really enjoyed working with it. It feels so intuitive.
Maybe it is useful to use a ECS with sysem level parrallelism (or extend evenio). If possible we could use a mix. Something that allows as to add handlers which are not run in parrall if we dont want to pay the cost of synchronisation or the order of execution matters. And still be able add handlers where the order of execution does not matter and which have the required bounds Sync/Send to be executed in parrallel.
But if there us a bettet solution than ECS I think we should try it out.
Regarding the PointCompassBulk
event: Just FYI That looks unsound as written if you're exposing that to users because duplicate EntityId
s would lead to aliasing of the &mut Packets
component when you go to iterate over that in parallel (could use a hash map of course). For a similar reason I don't see a hypothetical automatic system-level parallelism feature being what you want in this particular case. The overhead of synchronizing individual calls to compass
would get in your way.
I also want to point out that there are some ways to make "bulk" events more palatable to your API consumers. Here's one approach that comes almost directly from bevy_ecs:
use evenio::prelude::*;
#[derive(HandlerParam)]
struct EventReader<'a, E: 'static> {
events: Single<'a, &'static mut Events<E>>,
}
impl<'a, E> EventReader<'a, E> {
pub fn iter(&self) -> impl Iterator<Item = &E> {
self.events.events.iter()
}
pub fn iter_mut(&mut self) -> impl Iterator<Item = &mut E> {
self.events.events.iter_mut()
}
pub fn drain(&mut self) -> impl Iterator<Item = E> + '_ {
self.events.events.drain(..)
}
}
#[derive(HandlerParam)]
struct EventWriter<'a, E: 'static> {
events: Single<'a, &'static mut Events<E>>,
}
impl<'a, E> EventWriter<'a, E> {
pub fn push(&mut self, event: E) {
self.events.events.push(event);
}
}
pub fn init_event<E: 'static>(world: &mut World) {
// TODO: check that `Events<E>` doesn't already exist. (waiting for #57)
// Could probably do this automatically in the `HandlerParam` impl of `EventReader` and `EventWriter`.
let e = world.spawn();
world.insert(e, Events::<E> { events: vec![] });
}
#[derive(Component)]
struct Events<E> {
events: Vec<E>,
}
Now users can add an event writer to their system like:
fn my_handler(_: Receiver<Foo>, mut w: EventWriter<PointCompass>) {
w.push(PointCompass { ... });
}
and you're free to drain the PointCompass
events at a later time. Maybe even in parallel with other work via rayon::join
.
@rj00a thanks. I like the idea for making bulk events easier. I might do more of this.
Regarding parallelizable bulk targetted events, yes you are right—there could be aliasing. However, I think there are ways to potentially make this safe.
entity_id % num_rayon_threads
Perhaps rayon
allows your par_iter
to specify the particular thread? I think this could be super super helpful and there would not really be any synchronization issues that I can think of.
The idea mentioned in Concrete idea would probably require
I am guessing 2 is probably faster but would have to bench probably.
How should we structure egress? In my egress rewrite, I currently iterate through each player and call a list of functions to append s2c packets and then send them for each player. Should this use ECS? If each function generating an s2c packet was a system using a Vec for bulk events, it would only be able to start sending data once each players' packets have been encoded. I still need to do benchmarks to see how long encoding takes to see how much of a problem this would be, but I can see it being a potential performance problem.
How should we structure egress? In my egress rewrite, I currently iterate through each player and call a list of functions to append s2c packets and then send them for each player. Should this use ECS? If each function generating an s2c packet was a system using a Vec for bulk events, it would only be able to start sending data once each players' packets have been encoded. I still need to do benchmarks to see how long encoding takes to see how much of a problem this would be, but I can see it being a potential performance problem.
can you link to your rewrite? I am having trouble finding it.
https://github.com/TestingPlant/hyperion/tree/restructure-net
The main changes are in egress.rs and the egress directory.
TestingPlant/hyperion@
restructure-net
The main changes are in egress.rs and the egress directory.
do you mind making a draft PR so I can more easily look over diff?
How should we structure egress? In my egress rewrite, I currently iterate through each player and call a list of functions to append s2c packets and then send them for each player. Should this use ECS? If each function generating an s2c packet was a system using a Vec for bulk events, it would only be able to start sending data once each players' packets have been encoded. I still need to do benchmarks to see how long encoding takes to see how much of a problem this would be, but I can see it being a potential performance problem.
I don't think egress necessarily needs to be in ECS or needs to use ECS, but I do believe that the rest of the code should still utilize ECS.
So, what are you planning to do for other systems that append packets? How would that work?
Broadcast packets would work like they currently are right now, and other systems would be moved into the egress directory to generate s2c packets.
Broadcast packets would work like they currently are right now, and other systems would be moved into the egress directory to generate s2c packets.
Can you give me an example of how
#[derive(Event)]
pub struct PointCompass {
#[event(target)]
pub target: EntityId,
pub point_to: BlockPos,
}
would be handled?
I've done some more benchmarks:
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use valence_protocol::{Encode, VarInt};
use std::io::Write;
struct BlackBoxWrite;
impl Write for BlackBoxWrite {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
black_box(buf);
Ok(buf.len())
}
fn flush(&mut self) -> std::io::Result<()> {
Ok(())
}
}
pub fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("encode HealthUpdateS2c", |b| b.iter(|| black_box(valence_protocol::packets::play::HealthUpdateS2c {
health: 0.0,
food: VarInt(0),
food_saturation: 0.0
}).encode(BlackBoxWrite).unwrap()));
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
showing that encoding a simple packet takes about 5.8ns on my machine. This means that we can encode about 170k simple packets per millisecond, so I think doing egress packet generation on one thread is fine.
With the current setup, the compass would be handled by adding a module in the egress directory which checks if the player is a zombie, and if so, generate a packet to point the compass to the nearest player. All of the data about the player and the rest of the server would need to be passed to each function manually which is the main disadvantage of not using ECS for egress while using ECS everywhere else.
I believe we should change packet encoding to use ECS now that we know packet encoding is cheap, and send an event with the output buffer and player entity id, but I'd like to hear from other people's feedback as well before doing this.
I've done some more benchmarks:
use criterion::{black_box, criterion_group, criterion_main, Criterion}; use valence_protocol::{Encode, VarInt}; use std::io::Write; struct BlackBoxWrite; impl Write for BlackBoxWrite { fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> { black_box(buf); Ok(buf.len()) } fn flush(&mut self) -> std::io::Result<()> { Ok(()) } } pub fn criterion_benchmark(c: &mut Criterion) { c.bench_function("encode HealthUpdateS2c", |b| b.iter(|| black_box(valence_protocol::packets::play::HealthUpdateS2c { health: 0.0, food: VarInt(0), food_saturation: 0.0 }).encode(BlackBoxWrite).unwrap())); } criterion_group!(benches, criterion_benchmark); criterion_main!(benches);
showing that encoding a simple packet takes about 5.8ns on my machine. This means that we can encode about 170k simple packets per millisecond, so I think doing egress packet generation on one thread is fine.
With the current setup, the compass would be handled by adding a module in the egress directory which checks if the player is a zombie, and if so, generate a packet to point the compass to the nearest player. All of the data about the player and the rest of the server would need to be passed to each function manually which is the main disadvantage of not using ECS for egress while using ECS everywhere else.
I believe we should change packet encoding to use ECS now that we know packet encoding is cheap, and send an event with the output buffer and player entity id, but I'd like to hear from other people's feedback as well before doing this.
There are a couple of things I want to discuss.
First of all, even though encoding a simple packet takes 5.8 nanoseconds on your machine, encoding a packet is just one part of generating packets. You need to consider the actual logic. When it comes down to it, you'll probably want it to be multi-threaded. If you're encoding a packet and the logic can be parallelized, you'll likely want to do that encoding in parallel. Moreover, you don't want locks.
I think we should stick with Entity-Component-System (ECS). However, sending an event for every single output for every single player is not efficient. There is a non-negligible amount of time that goes into sending an event and getting it flushed. We should design everything to be parallelizable right now. From my personal tests, when I think something shouldn't be parallelized, it still ends up taking a significant portion of the game tick. This makes sense because if I have 24 cores and can do things 24 times faster, even if something is fast, being 24 times slower makes it really slow.
For now, ECS makes the most sense, especially from an extensibility standpoint. Although some have mentioned that ECS shouldn’t be used for extensibility, I highly disagree, especially for Rust. I think it works very well. I can present cases where I can achieve things in very few lines of code. Given Rust does not use Java-like OOP inheritance, I can't think of a better way to do this than ECS.
I added a multithreaded benchmark:
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use rayon::iter::{IntoParallelIterator, ParallelIterator};
use std::io::Write;
use valence_protocol::{Encode, VarInt};
struct BlackBoxWrite;
impl Write for BlackBoxWrite {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
black_box(buf);
Ok(buf.len())
}
fn flush(&mut self) -> std::io::Result<()> {
Ok(())
}
}
struct Player {
health: f32,
food: i32,
food_saturation: f32,
}
pub fn criterion_benchmark(c: &mut Criterion) {
let players = (0..10_000)
.map(|n| Player {
health: n as f32,
food: n,
food_saturation: n as f32,
})
.collect::<Vec<_>>();
c.bench_function("encode HealthUpdateS2c multi threaded", |b| {
b.iter(|| {
black_box(&players).into_par_iter().for_each(|player| {
valence_protocol::packets::play::HealthUpdateS2c {
health: player.health,
food: VarInt(player.food),
food_saturation: player.food_saturation,
}
.encode(BlackBoxWrite)
.unwrap()
})
})
});
c.bench_function("encode HealthUpdateS2c single threaded", |b| {
b.iter(|| {
black_box(&players).iter().for_each(|player| {
valence_protocol::packets::play::HealthUpdateS2c {
health: player.health,
food: VarInt(player.food),
food_saturation: player.food_saturation,
}
.encode(BlackBoxWrite)
.unwrap()
})
})
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
and got
encode HealthUpdateS2c multi threaded
time: [42.568 µs 42.962 µs 43.402 µs]
Found 16 outliers among 100 measurements (16.00%)
5 (5.00%) high mild
11 (11.00%) high severe
encode HealthUpdateS2c single threaded
time: [57.296 µs 57.493 µs 57.765 µs]
Found 16 outliers among 100 measurements (16.00%)
3 (3.00%) high mild
13 (13.00%) high severe
I originally thought that the overhead of communicating between threads would cost more, but it seems like keeping it multithreaded would be good for our use case. Do you think the current structure on my fork of having packet generation in the egress directory is okay?
I added a multithreaded benchmark:
use criterion::{black_box, criterion_group, criterion_main, Criterion}; use rayon::iter::{IntoParallelIterator, ParallelIterator}; use std::io::Write; use valence_protocol::{Encode, VarInt}; struct BlackBoxWrite; impl Write for BlackBoxWrite { fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> { black_box(buf); Ok(buf.len()) } fn flush(&mut self) -> std::io::Result<()> { Ok(()) } } struct Player { health: f32, food: i32, food_saturation: f32, } pub fn criterion_benchmark(c: &mut Criterion) { let players = (0..10_000) .map(|n| Player { health: n as f32, food: n, food_saturation: n as f32, }) .collect::<Vec<_>>(); c.bench_function("encode HealthUpdateS2c multi threaded", |b| { b.iter(|| { black_box(&players).into_par_iter().for_each(|player| { valence_protocol::packets::play::HealthUpdateS2c { health: player.health, food: VarInt(player.food), food_saturation: player.food_saturation, } .encode(BlackBoxWrite) .unwrap() }) }) }); c.bench_function("encode HealthUpdateS2c single threaded", |b| { b.iter(|| { black_box(&players).iter().for_each(|player| { valence_protocol::packets::play::HealthUpdateS2c { health: player.health, food: VarInt(player.food), food_saturation: player.food_saturation, } .encode(BlackBoxWrite) .unwrap() }) }) }); } criterion_group!(benches, criterion_benchmark); criterion_main!(benches);
and got
encode HealthUpdateS2c multi threaded time: [42.568 µs 42.962 µs 43.402 µs] Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe encode HealthUpdateS2c single threaded time: [57.296 µs 57.493 µs 57.765 µs] Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe
I originally thought that the overhead of communicating between threads would cost more, but it seems like keeping it multithreaded would be good for our use case. Do you think the current structure on my fork of having packet generation in the egress directory is okay?
With the current setup, the compass would be handled by adding a module in the egress directory which checks if the player is a zombie, and if so, generate a packet to point the compass to the nearest player. All of the data about the player and the rest of the server would need to be passed to each function manually which is the main disadvantage of not using ECS for egress while using ECS everywhere else.
I'm still not a big fan of this. I don't understand why packets need to be generated in the egress
directory. Can you go over that again?
The whole point is to make a base that people can easily extend. It won't be extendable if you need to add the logic directly into Hyperion, which is what I understand needs to be done now.
I think we should make a custom handler parameter or something similar so that we can handle things natively from ECS. This way, it remains extendable.
I'm still not a big fan of this. I don't understand why packets need to be generated in the egress directory. Can you go over that again?
The current egress code does the following:
This can be multithreaded by splitting the registered buffer across multiple threads, so each thread can still append to a buffer without locks.
There are a few other ways to do this with ECS.
Option 1: Replace step 1a above with sending an event for each player to generate s2c packets. The event will contain the buffer to append s2c packets to and the player id. This forces it to be singlethreaded.
Option 2: Use per-player buffers and then send a single event to generate every players' packets. This requires each player to have a buffer with the largest amount of data that can be sent per tick, which is not ideal in terms of memory usage and cache
Feel free to suggest other options here.
The whole point is to make a base that people can easily extend. It won't be extendable if you need to add the logic directly into Hyperion, which is what I understand needs to be done now.
The current system could still be extendable by storing a Vec
of fn
pointers and passing data about the player and the world to each function. However, this wouldn't allow programs using Hyperion to access custom data stored through ECS. Alternatively, we could accept IntoHandlers directly and call them manually, but I'd need to figure out how this would work first.
Option 1: Replace step 1a above with sending an event for each player to generate s2c packets. The event will contain the buffer to append s2c packets to and the player id. This forces it to be singlethreaded.
Okay, I think I'm understanding what you're doing more now. From what I gather, correct me if I'm wrong, the reason you're doing this is to use less space. By filling up the buffer, waiting for the events to finish, and then putting more data into the buffer, we avoid having one giant buffer where everything is dumped all at once. Is this correct?
I'm building a system to handle events in parallel right now. It is currently fluctuating quite a bit.
Related #350
We are at a point where I think it makes sense to discuss the architecture of the project. Some big points:
let me know if you can think of any more questions
Current problems
Right now we are using evenio
No system-level parallelism
This is an event to change the player skin. It is rather expensive, does not occur often, and does not usually occur multiple times within the tick. I could see if there are many distinct events that are similar to this, there could be a big perf hit since the systems cannot be parallelised.
Note: the signature is
This only acts upon that targeted entity. This seems like it would be a great case for parallelization. Note:
Compose
is defined asEntity-level parallelism requires often not having targetted events
Look at all these calls to the
compass
systemWe have
again because of no system-level parallelism we run into an issue where we would need to bulk these to perform them in parallel. This is sad because targeted events could potentially be a super big performance improvement for being able to parallelize things.
The new performant event would be
So we can
.par_iter_mut
on all of thePointCompass
elements. I feel there is a better way than this.Because of the previous section and this section events that are meant to be easily sendable from consumers of the API we would wish do not have to be bulk but when they are not we get something like
It is very clear where the events sent by the consumer of
hyperion
are.Possible alternatives