Closed vigna closed 2 weeks ago
Here's a brief summary of what you need to do to add a new framework to these benchmarks:
Cargo.toml
. It should pin itself to an exact version (e.g. 1.2.3
) and set optional = true
.src/datasets
mod.rs
. If your framework has a derive, add it with #[cfg_attr(feature = "my_framework", derive(my_framework::Serialize))]
. If your framework has a custom schema format, you'll have to modify the build.rs
to compile Rust code from your schema format.src
. This should be in a module named bench_my_framework
, and you'll need to also add it as a pub mod
in src/lib.rs
.Abomonation is alphabetically first and supports all of the benchmark suites, so it's a good example of what you need in your bench function. I'd recommend looking at a bench function for a framework that's similar to yours and basing it on that one.
If you have any other questions I'd be happy to answer them!
@vigna i tried to implement epserde into this benchmark
it cant be done via derive because of:
proc-macro derive panicked
message: not yet implemented: Missing implementation for union, enum and tuple types
derive definition:
#[cfg_attr(feature = "epserde", derive(epserde::Epserde))]
Ok. By any chance, do you know whether the problem is due to a union, enum, or tuple type?
@vigna to enum
type (context: error caused for EntityType
and GameType
enums)
I see. We'll try to implement enums and get back to you!
@hot-moms : we just implemented enums in the current version on github (which uses the current derive library by picking it up directly with a path). Do you need an official version on crates.io to continue with the implementation? We can do it, but it would be maybe better if you could try to use the current one on github as maybe we'll need to adjust it depending on your feedback. Thoughts?
@vigna as @djkoloski said above:
It should pin itself to an exact version (e.g. 1.2.3)
but as a draft, i can try, but you should wait a little bit
Ok, you can use epserde 0.3.0 and it should work with enums.
Did you make any progress? Can we help in any way?
Gentle ping :).
@vigna, sorry, no time for this, try to do as @djkoloski wrote here https://github.com/djkoloski/rust_serialization_benchmark/issues/55#issuecomment-1786325870
This's the steps that exactly i was doing when implementing some serialization systems, including e-serde
I'm trying to understand the meaning of "access" and "read". Probably their meaning is linked to some unspecified decomposition of the actions a zero-copy framework performs. For example, I can see that "access" for rkyv is <1ns, so it's doing nothing. I don't even know what access is in my case—you get a Rust object upon deserialization and that's it.
There is a gotcha—I had to add parameters to the data structures, with a default equal to the current type, as ε-serde needs to be able to replace the types. This is not a problem with almost all frameworks, but a couple (I still have to check one by one) do not welcome the idea. I could make the variant of the data structure depending on the feature "epserde", so that you can manually benchmark ε-serde with other frameworks accepting parameters. It wouldn't be part of the default.
The access and read benchmarks are zero-copy specific. "Access" measures how long it takes to provide access to some zero-copy data. Validation overhead may take some time, so that gets measured here. "Read" measures how long it takes to read some zero-copy data. This usually just entails sending the read data to a black_box
call so the read doesn't get optimized out. I'll document this in a more accessible way.
I looked at your linked repo and didn't see the data structure parameters you mentioned. Are those pushed?
I forgot to tell you—you have to look at the "epserde" branch.
Yes, but once again, "read" measure how long it takes... from where? Raw bytes? Fully deserialized object? Partially deserialized object?
Presently I have a test called "read (from deser)" that reads the data from a deserialized source. Some benchmarks include deserialization time in read time, other don't. I think it would be better to not have deser operations in that benchmark, because they dwarf the scanning time.
Another issue for me is that framework using relative pointers or some other dynamic relocation technique will be unaffected by an iteration test. A random-access test would probably be more discriminative.
Another major issue that makes me feel like I'm walking on eggs is that there is no standard for reading and accessing memory. For example, the read test of rkyv for mesh gives
mesh/rkyv/read (unvalidated)
time: [38.796 µs 38.809 µs 38.824 µs]
Wow. That's... faster than the speed of light. Abomonation and ε-serde are at 100 µs, and they're just scanning memory.—rkyv has even pointer indirection to handle. How's that possible?
Use the Source, Luke.
|mesh| {
for triangle in mesh.triangles.iter() {
black_box(&triangle.normal);
}
},
This is the read test function for rkyv. Note the ampersand. This test is scanning how fast you can enumerate pointers, not how fast you can access data. It is easy to see: replace with
|mesh| {
for triangle in mesh.triangles.iter() {
black_box(triangle.v0.x);
black_box(triangle.v0.y);
black_box(triangle.v0.z);
}
},
which is what the other tests are doing (accessing an entire vector), and, boom:
mesh/rkyv/read (unvalidated)
time: [118.12 µs 118.20 µs 118.27 µs]
These ampersands are spread a bit here and there—there should be some automated way to check that all benchmarks are measuring the same thing.
I forgot to tell you—you have to look at the "epserde" branch.
I was, when I looked it was only one commit ahead. It's now three commits ahead, so I'll take another look.
Yes, but once again, "read" measure how long it takes... from where? Raw bytes? Fully deserialized object? Partially deserialized object?
It's "read" the fastest way your framework can. There's no use benchmarking reads from a fully deserialized object because all fully deserialized objects should have the same performance (as they're all the same types with the same properties). For most frameworks this means reading through a zero-copy view of the data. The particular method is not specified because nobody really cares how you read out the data, just that it's representative of how your framework actually behaves.
This test is scanning how fast you can enumerate pointers, not how fast you can access data.
One could argue that enumerating the pointers is what we want to measure. Reading a vertex out of a reference costs the same amount regardless of the framework, so why measure it? The purpose of the read
benchmark is to highlight the different access strategies for different frameworks. A deserialization framework with a nontrivial access method for data should be penalized compared to a framework with a more trivial one. In terms of standards, this would mean that every framework should pass a reference to some data to a black_box
as opposed to reading the value out and passing that.
That's just one perspective on the issue. I think a benchmark that doesn't read data shouldn't really be called read
, so perhaps traverse
would be better. If you'd like to push on this, I'd be happy to review a PR that standardizes the read
benchmarks, and/or one that adds a traverse
benchmark. As per the README:
These benchmarks are still being developed and pull requests to improve benchmarks are welcome.
It's "read" the fastest way your framework can. There's no use benchmarking reads from a fully deserialized object because all fully deserialized objects should have the same performance (as they're all the same types with the same properties).
That's not true. Have a look at the Zerovec documentation (that's one of the most popular zero-copy framework, but it's not included in your benchmarks).
That's just one perspective on the issue. I think a benchmark that doesn't read data shouldn't really be called
read
, so perhapstraverse
would be better. If you'd like to push on this, I'd be happy to review a PR that standardizes theread
benchmarks, and/or one that adds atraverse
benchmark. As per the README:
Changing names to match actual behavior is definitely a way to go.
That's not true.
Please file a separate issue.
Closing this since help has been provided.
We would like to put together a PR adding ε-serde to the suite. Is there some documentation on how to do that, or can you give us some basic guidance and we start from there?