Open mjc opened 2 years ago
Thanks for the issue! I'm totally open to this as an optional feature, since almost all paths are valid UTF-8. For the PR, could you include some benchmarks + results you see on them (I like the criterion library)?
It's not ready for production so it definitely should be optional.
Could you elaborate on what this means? This is a bit concerning to me.
Actually, it looks like it's just the documentation being out of date! My bad.
From the README:
"This library has been thoroughly tested with sample data as well as fuzzing and there are no known bugs."
Thanks! It also looks like it switches to std for fewer than 64 bytes. I'm really curious how much of a benefit this is in real world use because most paths (except I guess on Windows) tend to be shorter than 64 bytes.
Yeah, it's probably not a huge benefit for most cases.
In my particular use case most of the paths are for a cache server, so they are broken up like this (and usually north of 100 characters):
/tank/application/usa/east/ny/cache/c/f/2/3/cf23df2207d99a74fbe169e3eba035e633b65d94
And then there's a fair amount of other path munging in the hot path. I get about a 1% per request boost out if it, which I suspect is the best possible case.
I totally understand if that doesn't feel worth the maintenance burden, you are right that most paths aren't that long.
Yeah, I think the benefits are with longer paths more than anything else, and I'm worried that the branching for shorter paths will reduce perf (will have to see this in benchmarks of course.)
How about we leave this issue open for now and see if others are interested in it? (If you're working within a corporate environment, feel free to patch camino locally of course!)
Having thought about this for a bit, I'm quite interested in seeing what this would look like. Could you put up some benchmarks for paths of various sizes and ASCII vs non-ASCII, etc? camino 1.0.8 has a proptest1
feature which might be useful for generating random paths.
/tank/application/usa/east/ny/cache/c/f/2/3/cf23df2207d99a74fbe169e3eba035e633b65d94
You may benefit from using https://github.com/bytecodealliance/cap-std/ - on modern Linux then you can open a directory fd (Dir
) for /tank/application/usa/east/ny/cache/
, then the resulting filenames are much smaller and more amenable to stack allocation even.
I have a project that does a bunch of path manipulation and benefits a bit from using a wrapper with the
simdutf8
crate.If I find time, and there is interest, I would happily work on a PR to add optional simdutf8 support. It's not ready for production so it definitely should be optional.
I'd love to be able to simplify a bunch of this code the way
camino
would normally let me.