Add custom allocator support

Tnze commented 1 year ago

There is a Vec::new_at() API when #![feature(allocator_api)] is enabled, allowing constructs a Vec in specific Allocator. This is helpful to have an equivalent in our Array to achieve certain special needs.

For example:

Allocate large array in allocators using "mmap". Allowing us to allocate arrays that exceed the physical memory size.
Allocate small arrays in "arena" allocators, which provide extremely fast allocations and batch destruction.
For me, I just want to use a custom allocator to calculate how much memory I have used. It's just icing on the cake, it's not urgent at all.

TudbuT commented 1 year ago

Would it make sense for me to just put this into some default-off feature flag, or do you think it would make sense to have this enabled by default?

Tnze commented 1 year ago

I don't think it should be enabled by default since Vec::net_at() is nightly-only experimental API. We can enable it by default later, wating the allocator_api being stable.

TudbuT commented 1 year ago

That makes sense. I have started working on this feature already as it doesn't seem too hard, but I'm taking a break right now.

In the meantime, I would love to get some general feedback on the library so far, if you have some time.

Tnze commented 1 year ago

It's quite fitting my requirement.

My application needs 6D and 8D arrays which ndarray doesn't support at all. This library provides nice simple APIs and very high performance(by the way, is it possible to use SIMD to speed up particular dimensions array's indexing?). It's just the best choice for anyone who wants substitute of array of arrays of arrays... without any linear algebra etc.

I previously using nested array the rust-lang provides. But it compiles very slow: https://github.com/rust-lang/rust/issues/88580 , and lacks of flexibility: I cannot choice array length at runtime.

So thanks for the project, freeing me from troubles of multi-dimensional arrays.

TudbuT commented 1 year ago

Thanks for the feedback!

It would probably be possible to use SIMD, but I think(?) rust doesn't have proper support for that yet.

My next idea for speed improvement is having very slight implementation differences between dimensions to e.g. not use the big index calc routine for 2D or 3D. Performance doesn't change at all between these and 8D actually, and I suspect it might be because I use the big routine for them all.

But I feel like it's a bit against the micro-spirit of the project. Something else I could imagine is replacing my use of vectors with the use of pointers and directly manipulating them. I do all safety checks myself anyway so a vector is actually kind of pointless, at least for access. I would probably still create a vector to hold the data as that's just easier than handling that myself and has no performance penalty if I only use the internal pointer of it.

TudbuT commented 1 year ago

By the way, since I haven't seen your use-case for the arrays, I don't know if this is relevant to you, but if you store some kind of tuple or fixed length array inside of your ndarrays, you might also like the vec_split crate which is re-exported by default. It allows you to turn an Array<(T, T), N> into two Accessor<T, N>s which can be handled as if they were two separate arrays which might be handy for some algorithms. There should be only very minimal performance penalty to this and it can greatly simplify some algorithms.

Tnze commented 1 year ago

Cool features! Looks like split the array and provided two views of each field. I may need it in the future :)

Is it possible to create Views of part of an Array? like

let a: Array<_, 2> = ...;
//[0 1 2
// 3 4 5
// 6 7 8]
let view: Array<_, 1> = a[[None, Some(1)]];
// [1
//  4
//  7]

I guess it can be done by multiplying index into strides.

Also, we can swap the order of indexes by just swap strides synchronously, allowing us to do some transposition.

It would probably be possible to use SIMD, but I think(?) rust doesn't have proper support for that yet.

I had seen rust's SIMD APIs, since what we need is multiply strides with index per values, and add them together, there are direct functions to do that. The only problem is only some of N is supported. (Maybe could we adding with LaneCount<N>: SupportedLaneCount; for the SIMD impl?)

Another choice considered is adjusting the code to make compiler's auto vectorization work. (Or it is already working？）. It is necessary to confirm by observing generated assembly code.

TudbuT commented 1 year ago

Yeah some more array view stuff could absolutely be useful, I'll put it on the list™.

About the SIMD: Sure, I'll take a look at that and see how much improvement I get. I should probably try an array with 200 dimensions some time, where this would probably shine.

Tnze commented 1 year ago

In fact, it is unlikely to use a 200D array. It is more likely that we will use a 3D or 8D array, but calculate the index millions of times.

TudbuT commented 1 year ago

Hence why I think I should test it. SIMD might be slower for small dimensions, but much faster for larger ones, in which case I would just draw a line somewhere at which point SIMD would start to be used.

TudbuT commented 1 year ago

Added in b6e7ecd. This shouldn't break anything and does maximal code reuse.

TudbuT / micro_ndarray

Add custom allocator support #6