JErnestoMtz / rapl

Rank Polymorphic array library for Rust.
103 stars 3 forks source link

Display is too slow #28

Closed DeliciousHair closed 1 year ago

DeliciousHair commented 1 year ago

As an example, if one does:

let x = Ndarr::from(0..10_000_000).reshape((1_000, 100, 100)).unwrap();
println!("data: {}", x);

then one should get comfortable as they will be waiting a while.

I would like to propose some output formatting that includes some truncation rules both horizontally and vertically, as well as displaying some useful metadata (array size and dtype), and maybe even some formatting to make things slightly more attractive. This would prevent any formatting from attempting to format the entire tensor.

JErnestoMtz commented 1 year ago

I've just revisited my Display implementation https://github.com/JErnestoMtz/rapl/blob/main/src/display.rs, and I think the problem might be a bit more complex than I first imagine. First I thought it was because we were trying to print the entire array, but in fact we are not. There is a limit for the length of the printing, that yes, it was set up too high, and now is a bit better. But that doesn't make thing faster, the problem is that we are processing recursively the entire Ndarr by slicing it and creating smaller Ndarrs up to Rank 1 that can be formatted. So you can imagine how many Ndarr of Rank 1 are created with a huge array like that. For example this two cases, have the same number of elements, but the first is much much faster that the second one, this is just because the recursion base case is a Ndarr<T,U1> which is the last dimension of the reshape:

//relatively fast 
 let a = Ndarr::from(0..2_000_000).reshape([1, 2, 1_000_000]).unwrap();
 println!("a = \n {}", a);
//SLOWWWWWWWWW
let a = Ndarr::from(0..2_000_000).reshape([1_000, 1_000, 2]).unwrap();
println!("a = \n {}", a);

So I think this is a very interesting problem, because is hard to escape the need of recursion if we want this to work for any dimensions. My best bet for how can we solve this, is to make a function that somehow summarize the big Ndarr into a either a smaller Ndarr or another Struct that holds just the needed information.

DeliciousHair commented 1 year ago

Indeed! I think a summary would be very useful, should maybe have a think about what said summary might look like though. For example, let's say one is looking a collection of 3-channel images, shape of [N, 3, 128, 128] (arbitrary numbers). There generally is not much to be gained by looking at the full array as there is simply too much to look at, but having a look at, say, a single 128x128 image often is, just to check that the numerical values are sane. Thus, maybe a .display() method should take a slice parameter or similar to allow one to define what one wants to inspect? With some sort of default values too? Given that anything beyond a 2-dimensional array gets a bit esotaric in terms of displaying data on a screen anyway, I could see something like this being as useful. Maybe.

JErnestoMtz commented 1 year ago

So, the formatting is still mostly the same. But I just push a commit https://github.com/JErnestoMtz/rapl/commit/cf9675ef70f7ca4ed459c5427e0cf3c477dba13c that improves tremendously the performance, now the example given by @DeliciousHair runs in less than a second. This was achieve by slicing just the sub-array we want and not the complete array.

I also clean up a bit the display.rs code and added a new parameters to change the number of elements being displayed in each direction.

Now is just a matter of integrating confy-table to get beautiful formatting

JErnestoMtz commented 1 year ago

Closing issue, adding new issue on formatting https://github.com/JErnestoMtz/rapl/issues/35