Closed dan-da closed 5 months ago
I just got bitten by a related ergonomics issue. Namely get_many() does not return the original indices, so caller must keep track of them. And worse it is a logic error to use the array indices that are returned by get_many().
In working with these types in neptune-core, a pattern is to (1) get_many(), (2) mutate, and then (3) set_many(). My natural inclination is to write this as:
let vals_to_modify = v.get_many(&[1, 2]);
let vals_for_update = vals_to_modify.iter().enumerate().map(|(i, v)| (i, v += 1));
vals.set_many(vals_for_update);
this however is a logic error. It will update elems 0,1. not 1,2 as desired. The correct way would be something like:
let indices = [1,2];
let vals_to_modify = v.get_many(&indices);
let vals_for_update = indices.iter().zip(vals_to_modify).map(|(i, v)| (i, v += 1));
vals.set_many(vals_for_update);
This works, but is gross to always have to zip together indices and values.
If we have a get_many() that returns Vec(Index,T)>, then we can do:
let vals_to_modify = v.get_many(&[1, 2]);
let vals_for_update = vals_to_modify.iter().map(|(i, v)| (i, v += 1));
vals.set_many(vals_for_update);
cleaner!
I made a little test case that demonstrates the logic error when using get_many result incorrectly:
#[test]
fn get_many_indexes() {
let db = get_test_db();
let mut v: RustyLevelDbVec<u128> = RustyLevelDbVec::new(db.clone(), 0, "unit test vec a");
let pattern = vec![100, 200, 300];
v.push(pattern[0]);
v.push(pattern[1]);
v.push(pattern[2]);
let set = v.get_many(&[1, 2]);
// one might (wrongly) think this should set indices 1 and 2 to their original values.
// instead, it will set indices 0 and 1 as returned by `get_many`.
// So there is a loss of symmetry here.
v.set_many((0..).zip(set));
assert_eq!(pattern, v.get_all());
// failing result is:
//
// assertion `left == right` failed
// left: [100, 200, 300]
// right: [200, 300, 300]
}
So this is not "wrong behavior" by set_many, it is just an easy trap to fall into, as the caller.
Especially because one can use the indices returned from get_all() as input to set_all() without problem.
I see a 6th option: Get rid of get_all
, and potentially set_all
. It can be argued that those functions are too easy to use incorrectly anyway, as they may very well crash the program because it asks for too much RAM. And get_all
can simply be replaced by v.get_many(&(0..normal_vector.len() as u64).collect_vec())
. Notice that the allocation of the index-vector in RAM is fine, as you need to allocate the return values anyway, and it will most likely be much larger.
The returned value from get_all
obviously lives in RAM, and for big enough data sets that will clearly not fit.
If you feel that get_many
is unergonomic feel free to change it as you suggest. I don't mind its current behavior but I'm also not defensive about it.
storage
functionality was removed from twenty-first
in 3775c51b.
At present, the trait looks like:
We can notice that
set
andset_many
accept an index of typeIndex
whileget_many
andget_all
return aVec<T>
Index
is defined to be au64
.Vec
native indexes are of typeusize
which would be 2^32 on a 32 bit system and 2^64 on 64 bit system. (to state the obvious)In order to generate the return
Vec
, implementors of this trait cast the internal indexes of typeIndex
tousize
. This cast can panic on a 32bit system if the index size is > usize::Max ie 2^32. example cast. So for exampleget_all
would panic at the 2^32+1 element. The situation is not quite as bad forget_many
. On a 32-bit system it would be limited to returning 2^32 elements, but would not panic (because the input list of indices is also an array/vec).set_all
also has a problem because it accepts&[T]
. On a 32 bit system this would be limited to 2^32 elements, which of course is much smaller than the 2^64 total addressable elements in StorageVec, eg as returned bylen()
. For a collection with more than 2^32 values,set_all()
would panic because implementors assert that input length is equal to target.len().Some possible actions/remedies are:
get_many
,get_all
,set_all
, but doc-comment them, and add variants of each that usesVec<(Index, T)>
instead ofVec<T>
.Of these, I prefer option 4. It is backwards compatible with existing code/tests, but also provides a "more correct" path forward. It leaves the existing methods for convenience, and places responsibility for length checking with the caller.
Option 1 seems not a good idea, especially as this is a crate published to crates.io.
Option 3 would require the most up-front work, and also there are many legitimate uses for calling the existing methods, where it is known that the length is less than 2^32. And these methods are more convenient than passing around a tuple (Index, T).
Option 5 is simple. It probably is not workable due to how this library is being used in practice, where more than 2^32 elements are expected in eg MMR.