Cryptjar / sub-array

Extract a sub-array out of an array
Apache License 2.0
3 stars 0 forks source link

Collaboration proposal #1

Open WalterSmuts opened 2 years ago

WalterSmuts commented 2 years ago

I like your idea! And the name! (hence why I'm asking to collaborate rather than publishing my own crate).

How do you feel about trying to make this api a bit closer to the normal slicing API?

let sub_slice = &slice[1..2]

I've played around a bit and the closest I can get is this:

let sub_array: &[u32; 2] = &array[range!(1, 2)];

I think with a little extra proc_macro trickery we may be able to get to the following:

let sub_array: &[u32; 2] = &array[range!(1..2)];

The main value I believe this crate would bring is to provide a safe API for doing sub-array slicing on arrays. You can enforce the safety via compile time constraints using trait bounds. This is unfortunately not possible using slices, so I would also propose the API on slices get removed. It feels a bit strange to have it in a crate with the word array in anyway.

There definitely are some disadvantages of doing what I'm suggesting. The main one is that it requires generic_const_exprs feature to do the trait bound constraints, which is still unstable. This means the consumers of this library need to use the nightly compiler until it get stabilised.

Have a look at this playground to get an idea of what I'm suggesting.

The main thing you may notice where I've gone in a separate direction from you is the use of the SubArray wrapper struct. This is to allow overloading of the Index trait. I'm not married to the idea, but I do think it has some advantages.

Let me know what you think! :)

Cryptjar commented 2 years ago

Thanks for your offer and your input. I'd like to accept your offer.

You make a very good point about making the API more readable. Honestly, I never liked the current style with sub_array_ref::<LEN>(offset) since it is kinda hard to read. I am also very interested in adding compile-time constraints to prevent as many panics as possible at compile time (I supposing that is your second suggestion).

Given your comment on slices, I'd like to clarify that (as I meant it) this crate is about obtaining a "sub-array" not working on "sub-array"s. And thus, it is quite sensible (in my opinion) to extract a "sub-array" from a slice, tho it is true that this operation can't be checked at compile time and thus generally may panic at run-time. Nevertheless, I would say that slices as a source should be at most of secondary concern, but I would rather keep them.

Anyway, as far as I understand, this is concisely what you are proposing:

  1. Adjusting the API to look more like conventional slicing, that is in particular:
    • Specifying the range with some Range-like syntax (i.e. from..to)
    • Using the Index-operator (i.e. implementing that trait)
  2. Introducing compile-time constraints to ensure that there will be no panics at run-time (specifically out-of-bounds errors)

At first glance, I supposed all of these enhancements would require the Nightly generic_const_exprs feature (what you wrote also sounded a bit like it). However, you showed (via your playground code) that (1) can be entirely implemented in stable Rust, and only (2) requires the generic_const_exprs feature.

Thus, I consider the following plan:

Separating those two points will allow users to keep using this crate with a stable compiler, tho without compile-time constraints, which at least is consistent with the current implementation, at least they get a nicer API. And for those who can afford to use Nightly, they get the compile-time constraints as well.

Some further notes:


Of course, I also looked into the playground example code that you kindly provided. You did there some nice ground work. Some points which I found can be improved/fixed:

  1. We don't need a wrapper struct, we can implement Index directly or Rust's arrays.
  2. We can just use regular marco_rules to implement the range!(2..5) syntax.
  3. You appear to have mixed up the "from-to" and "start-size" semantics.
  4. You forgot(?) to implement the compile-time constraints.

Here a refined version of your code, that fixes the above.

In terms of a future API there is the aforementioned panicing variant, that should work with stable Rust and may work with slices, which I imaging to look something like this:

// The trait providing the API
use sub_array::CheckedSubArray;
use sub_array::range;

let slice: &[u8] = &[1,2,3,4,5];

// works
let sub_array: &[u8;2] = slice.checked_sub_array_ref(range!(3..5)).unwrap();
assert_eq!(sub_array, [4,5]);

// compiles, but panics
let sub_array: &[u8;2] = slice.checked_sub_array_ref(range!(4..6)).unwrap();

Basically the current API, (mostly) same semantics, same source types (arrays work as well), but with more verbose names to make "space" for the constraint version. As you can see, I began to like a "checked" version that returns an Option better than directly panicing, it's more like a spontaneous idea, but it makes it easier to find a name for it.

And then the Nightly feature-gated compile-time enforce variant:

#![feature(generic_const_exprs)]

// The trait providing the API
use sub_array::SubArray;
use sub_array::range;

let array: &[u8; 5] = &[1,2,3,4,5];

// works
let sub_array: &[u8;2] = array.sub_array_ref(range!(3..5));
assert_eq!(sub_array, [4,5]);

// wouldn't even compile
//let sub_array: &[u8;2] = array.sub_array_ref(range!(4..6));

And of course, the Nightly variant would also allow indexing with the same semantics as above:

#![feature(generic_const_exprs)]

// (Const)Range is all we need
use sub_array::range;

let array: &[u8; 5] = &[1,2,3,4,5];

// works
let sub_array: &[u8;2] = &array[range!(3..5)];
assert_eq!(sub_array, [4,5]);

// wouldn't even compile
//let sub_array: &[u8;2] = &array[range!(4..6)];

What do you think about all that?

WalterSmuts commented 2 years ago

Wow! You took the idea and ran with it! Nice :)

Basically each of your comments I agree with. Nice work on making it all just BETTER.

We don't need a wrapper struct, we can implement Index directly or Rust's arrays.

This is probably the most glaringly obvious thing I overlooked. Well spotted! That makes it all much nicer.

Given your comment on slices, I'd like to clarify that (as I meant it) this crate is about obtaining a "sub-array" not working on "sub-array"s. And thus, it is quite sensible (in my opinion) to extract a "sub-array" from a slice, tho it is true that this operation can't be checked at compile time and thus generally may panic at run-time. Nevertheless, I would say that slices as a source should be at most of secondary concern, but I would rather keep them.

Yep, you're 100% right. For some reason I'm in a "const check everything" mindset, and it's easy to overlook other use-cases. There is no reason not to extend the API to slices too. Nice!

Some further notes:

  • When reading the above, one might get the impression, that the crate feature will change the semantics of the API, which in my opinion should be prevented. Thus, I propose two different APIs (or traits more specifically), one with panicing semantics and another that has compile-time constraints and is practically panicing-free, the latter one will be only available with the said crate-feature and a Nightly compiler.
  • I'd like to push users to use the compile-time constraint version (if available). Thus, the compile-time constraint version should get the simpler and more obvious name. Also, I suppose using the indexing operator is the most obvious usage pattern, and thus should have the compile-time constraints (tho that will make it only available on Nightly).

Mmm... I think I understand your concern. If we had exactly the same API and all that adding the const_checked feature would change is to add some compile time constraint checks, then we could turn a compile-able program into a non-compile-able program. So turning the feature on would essentially be a major version bump. This would be especially concerning when the generic_const_exprs feature becomes stable and we'd want to remove the feature or perhaps set it as default. Then we'd be breaking folks's programs if we don't do a major version bump.

The strange situation is that we'd only be breaking programs that would DEFINITELY panic. Although, only if it hits that case at runtime, which it may not. Not sure how strict semver is in this situation, but IMO this may be a grey area. The alternate is just to do a major version bump when that upgrade happens. Then we don't break programs but allow for the more ergonomic API in the current panic-able version. So I guess the question is: "Which is more important, not having a major version bump, or having an ergonomic API?"

Of course, I also looked into the playground example code that you kindly provided. You did there some nice ground work. Some points which I found can be improved/fixed

All of them are better! NICE!

In terms of a future API there is the aforementioned panicing variant, that should work with stable Rust and may work with slices, which I imaging to look something like this:

I like the idea of a checked variant :+1: But I do think that the unchecked variant would be useful too (You've completely convinced me of the slice use-case :P). So the unchecked variant could use the slice[start...end] notation and internally call the unchecked variant and just unwrap() or expect("Wrong size") the result.

A note on naming: I'm not sure the ref suffix is standard. I understand you're obtaining a reference to the sub-array, but looking at the rest of the naming in the standard library it's not explicit. For example index and index_mut has no ref suffix for the immutable reference variant, even though index returns a reference. There are many other examples in the naming used in this way just in the slice documentation. I do see some examples of the ref pattern though. Since we're working with the Index trait I'd suggest we follow that convention, although suddenly I'm not so sure of myself anymore.

Anyway, this is exciting! Thanks for being open to collaboration! I think this would turn into an extremely useful crate!

WalterSmuts commented 2 years ago

I can code up a PR to get the ball rolling. Just want to co-ordinate that we don't both start implementing and have wasted effort. Let me know if you'd rather do it yourself or if you have thoughts on the above comment. Will likely start implementing in a day or so.

Cryptjar commented 2 years ago

Sorry, for the late answer. I'm a little bit busy currently, so it sounds quite good to me, if you like to start an PR.

About your comment regarding semver: an "in-place" upgrade from panicking to compile-time errors would be definitely a breaking change, and there are multiple reason why this would be very bad here, for once, a program that panics is strictly speaking still a valid program (tho probably not very useful), so preventing them from compiling breaks them. More practically examples are:

However, there are actually wrong programs those exhibiting undefined behavior (very different from panics), and fixing them is typically not consider a breaking change, because they were totally broken before, but that does not apply here.

Actually, in my mind, I always assumed that the new API will be a breaking/major change. So I'm cool with making a breaking change, much rather than accidentally breaking someone else's code.

About the checked-ness: I totally agree that the checked variant is nice and that it would be useful to also have an "unchecked" panicking variant. But I'm not sure what a nice API would be: literal writing foo[start..end] would just yield a slice, and I'm somewhat strongly opposed to the idea of using foo[range!(start..end)] (foo: &[T]) as panicking variant, because when reading code, it becomes indistinguishable from foo[range!(start..end)] (foo: &[T;N]) which instead of panicking would have the compile-time guarantee that if it compiles, it never panics. And this get's even worse when we consider that arrays may (in certain places) silently deref to slices.

However on a seemingly unrelated note, I also noticed that when we rewrite the API to only use the rang! macro particularly for slices, it would be a regression in the functionality of this crate, because the with the current crate, one can actually extract an sub-array at an arbitrary run-time given offset (maybe most useful with slices, which can be arbitrarily long and one might e.g. extract a sub-array from the end).

So I though maybe we could jointly solve these two issues by introducing a second "ConstRange" type and macro, one that isn't "fully const" but rather "half const" i.e. having a const size but a variable offset (sort of how the current API works). I'm kinda struggling with a good name for it, maybe something like ConstSizeRange (emphasizing that only the size is const) or VariableOffsetRange (emphasizing that only the offset is variable). The macro would be named analogously, it should accept the same patterns as the ConstRange but needs additional rules to be able to specify the variable offset (I really think the old foo<SIZE>(offset) style is kinda bad), maybe some special-case rule that parses var_off_range!(n..n+3) (where n can be a variable) can do it.

Then we can still support the arbitrary offset use-case while also having a distinguishing feature to implement directly panicking indexing:

BoundsCheck at: Compile Time Run Time
Slice n/a foo[var_off_range!(3..5)]
Array foo[range!(3..5)] foo[var_off_range!(3..5)]

Lastly, about naming: I actually wasn't that much aware about the "missing" refs in the std-lib, I think I was looking at the Any-trait, which adds the ref suffix to disambiguate between by-value and by-reference use-cases. However, I don't really see a strong use-case for by-value extractions, which couldn't be achieved by just copying or cloning the result of the by-reference functions. So I think we can get rid of the ref suffix.

WalterSmuts commented 2 years ago

Hi, just touching base again. Haven't forgotten about this but things got a bit busy my side. I still intend to do this PR but after reading your [great] suggestions I've realised it's a bit larger than I originally thought. Just letting you know I haven't disappeared.