Open pickfire opened 3 years ago
Thinking about this a bit, at the RawTable
level, this should be possible with a minimal change: separate the element type (F
in fn ptr<F>(&mut self, field: Field<T, F>) -> *mut F
) from the capacity calculations. This should let ptr
return, for example, a *mut u64
to capacity / 64
elements, rather than allocating space for capacity
of every field's "actual" storage type.
I can imagine doing this in the derive macro (e.g. #[soak::elements(64)] storage: u64,
), or alternatively with some kind of wrapper type (e.g. storage: Elements<64>(u64)
) that with_capacity
could pick up on. (Though, perhaps that would require specialization?)
A design note for anyone looking to implement this: I would prefer to avoid concepts like bitfields in this layer, but rather describe the memory layout used to implement bitfields (and potentially other formats) directly. So on that note, it might be worth collecting more formats to consider before picking an approach.
This should let ptr return, for example, a *mut u64 to capacity / 64 elements, rather than allocating space for capacity of every field's "actual" storage type.
It should be usize
ideally rather than specific size.
But I don't think wrapper type would be nice since that will remove the original structure and won't be usable without the wrapper type, so with the derive macro it can still work with a single type.
Other than bitfields, we could have one for String
, rather than keeping the pointer and length. We have a pool of string, and then for each string we just keep the index for the pool and have an additional separate length. Example, v
means index and ~
means total length so we can keep minimal stuff. So with this type of storage we can remove the deletion feature but makes it very good for storage and querying. Maybe?
v --- ptr
| aaoeuu | bueue | cauuo |
^ ^ ^ ~
Other than these two storages, I haven't thought about others.
Say, if there is a
bool
in struct. When columnar storage is used wouldn't it be more useful that we can make use of every single bit rather than using a byte for a bool? Rust compiler optimizes struct itself by rearranging the fields around and use niche where possible but in this case if we changed it to something likeVec
rust won't be able to recognize this. Maybe we can have likeSo we can customize how it is being stored internally, rather than
Vec<bool>
(or something similar), we haveBitField
which is much more smaller and each item occupies a single bit.Other than that, it also allows optimization like comparing the two bitfields and make merge or do stuff there, I think probably pandas or arrow uses this sort of optimization.
By the way, it may be cool to be able to combine rkyv https://github.com/djkoloski/rkyv @djkoloski with this as well but I wonder how that can be done.