blitzpp / blitz

Blitz++ Multi-Dimensional Array Library for C++

https://github.com/blitzpp/blitz/wiki

Other

405 stars 84 forks source link

New Blitz11 Project #48

Open citibeth opened 6 years ago

citibeth commented 6 years ago

@slayoo @tobias-loew @vukics

Blitz++ is great, but let's face it... it's an old codebase from the pre-C++11 era, and has a lot of weird hacks / workarounds. It does what it does well, and has been quite stable. BUT...

It is looking increasingly dated in the days of C++11
The original authors are long gone
The build and unit tests are a mess
Alterations could break things, negating one of Blitz++'s current good features (its stability)

I would like to see if we can build a new Blitz for C++11, which I'm calling Blitz11. The singluar focus would be to build a great multi-dimensional array package for C++11. None of the other features of Blitz++ would be ported or reimplemented. And we would not worry about compilers pre-C++11. Once we get it working well, we can shop it around to see if any of Boost, Eigen, etc. might be interested in incorporating into their larger system. We would use CMake to build and Google Test to write tests.

I'm currently collecting requirements and building prototypes. Requirements I've identified so far are: (please add to this list):

Necessary Features

Arrays are fully C++ compliant, can be held in standard C++ collections. Can be included in larger structs without having to write custom move constructors / assignment for those structs.
Array unnderlying memory is either: (a) a shared ptr, or (b) borrowed. Both variants have the same top-level type. When new arrays are constructed from an old array, the new array shares or borrows, according to how the old arrya did it.
Handles const correctly.
Array layout is fully configurable: any base, any stride (even negative), any dimension order.
Full complement of array slicing, reshaping, etc. Must include reduction of dimension.

Nice-To-Have Features

Shared memory blocks may be used, managed, loaded, saved, etc. separately from any particular Array used to access them.
The dope vector (array layout) is accessible as a first-class object. Dope vectors may be copied, manipulated and used to construct new Arrays.
Bounds checking. Controlled by a template parameter. A global (compile-time) parameter can also be used to turn off all bounds checking.
Bounds errors reported in a way that can generate a stacktrace.
Allows to write code that can work with an any-dimension Array. This mode of access is not expected to be fast.
Vectorized operations, lazy evaluation magic, etc. as in Eigen::Tensor, Blitz++, etc. This is a cool feature, and it is well known how to implement. But it is not essential. Initial design should be built in a way to not PRECLUDE these features from begin efficiently added in the future.
Support ultra-long dimensions (>4 billion extent in a dimension)
Support inter-process shared memory arrays (eg. boost::interprocess).

The current prototype is focused on UI and logical design. It is not expected to be fast. I believe that others here have better experience with optimizations, variadic templates, etc.

Any thoughts? Any opinions on licensing? Please see my current prototype: https://github.com/blitzpp/b11/blob/master/slib/b11/array.hpp

vukics commented 6 years ago

I fully agree and can’t wait to migrate to Blitz11. Even though I won’t be able to contribute substantially, as a long-time blitz++ user, I take the liberty to comment. I have one question: what is the difference of scope in comparison with Eigen::Tensor that justifies Blitz11 as a separate library? And one suggestion for a nice-to-have feature: interoperability with numpy.ndarray (e.g. through boost::python::numpy::ndarray).

slayoo commented 6 years ago

Concerning you question:

i. I understand @citibeth's intention as a initiative to actually answer this question - draft an API, consulting with Blitz++ community, that would depict which features are sought after, and only then compare it to Eigen::tensor and possibly other solutions to decide what to do next:

use Eigen::tensor
contribute to Eigen::tensor API
write a wrapper around Eigen::tensor
seek community (and funding?) for development of a new library, etc

ii. Close integration with C++11/C++17 built-ins would probably be part of the answer

iii. When the discussion on Blitz on github started, I've tried to list some of the API features that I have found unique myself and useful in Blitz and posted them here: #8. I've just added there the replies I've got from Eigen developers (I assumed these were there, but realised now that these were sent over e-mails)

iv. I'm aiming at summarising Blitz features, including those not documented in the Blitz manual and those not covered in the testsuite here (planning to also automate execution of these examples on Travis/Appveyor): https://github.com/blitzpp/blitz/wiki/Features (contributions and comments welcome). Perhaps this will also provide some points for discussion here.

citibeth commented 6 years ago

I have one question: what is the difference of scope in comparison with Eigen::Tensor that justifies Blitz11 as a separate library?

Eigen::Tensor does not provide the flexibility I need in memory management.

A blitz::Array can be constructed with its own memory, or pointing to external memory. This functionality is not perfect and has some quirks (which I hope to address with B11). But it is extremely useful, and essential to so much of what I do.

Eigen also allows for self-allocated vs. externally allocated variables. This is done through the Eigen::Map<> template. A variable of type Eigen::Tensor<...> is self-allocated, whereas one of Eigen::Map<Eigen::Tensor<...>> would be externally allocated. Now consider what happens if you wan to make a function taking an array as a parameter. If you declare the parameter to be of type Eigen::Tensor<...>, then it will not be possible to call your function on arrays using borrowed memory. Conversely, if you declare it to take Eigen::Map<Eigen::Tensor<...>>, you can't call on self-allocated arrays. This is kind of a show stopper, IMHO.

I considered writing B11 as a thin wrapper around Eigen::Map<Eigen::Tensor<...>>. Basically B11 would manage the memory and the underlying type would always be the same. HOWEVER... Eigen::Tensor is experimental and does not support Eigen::Map<> functionality. So this is not even possible.

And one suggestion for a nice-to-have feature: interoperability with numpy.ndarray (e.g. through boost::python::numpy::ndarray).

Good idea. I have my own interop stuff for Blitz++, but it could probably be improved. Python interop also requires stuff specific for different C/Python interface systems --- boost::Python, Cython, etc. Including and supporting that in B11 would make it really useful. And don't forget Fortran interop as well.

Definitely... we should keep #8 in mind in design and implemenation of B11.

bhelenbr commented 6 years ago

I think this is a great idea. I am a long time blitz user and would love to see it modernized. The reason I found this thread is because my compiler has been throwing warnings about losing integer precision in range.cc (combinations of int's and ptrdiff_t in the range object). I was thinking about trying to fix that, but maybe that is not worth the effort? In any case, I've made some minor bug fixes to blitz over the years, but am not an expert. I am still willing to help however I can though.

slayoo commented 6 years ago

@bhelenbr Thanks! If any of those fixes would qualify for the current Blitz "maintenance mode" here at github, please do submit pull requests!

tobias-loew commented 6 years ago

I totally agree that the library needs a freshup if not a complete rewrite. But concerning the necessary features, I would like to see different priorities:

Let's ask the question: Why do people use C++ and why do they use Blitz++? IMHO the answer is: Speed! C++ is used because of its zero-overhead principle: "What you don't use, you don't pay for" [B. Stroustrup]. And Blitz++ provides a convenient array-abstraction including arbitrary base-indices, slicing and vectorized-operations.

The reason, why C++ doesn't provide vectorized-arrays at a language level, is simply, that it doesn't have to, because it can be implemented in a library.

I see Blitz++ and its successor as exactely that library, providing Fortran90-style arrays at a zero-abstraction overhead. So, my non-exhaustive list of necessary features contains

Arrays are C++ value-types. Support all kind of copy/move operations (if possible). Arrays can be used with std-algorithms.
The providing and handling of underlying memory is specified in a Policy-Type. There are some default-policies e.g. "shared_ptr", "borrowed", "allocator-provided", "heap", "std::array", ... but users should be able to easily provide their own policies (e.g. the memory could be on a gpu or somewhere else over the network). This implies that there is no common array-type, it's just a template.
Slices and strides shall be possible, but there has to be enough template-magic in the background to provide optimal-results: a temporarily used slice from (the least significant dimension of) a contiguous array is just a reference to a "std::array"-array, and doesn't need dope-vector. (Remeber, it's all a about speed)
And of course vectorized operations via template expressions.

Using C++ 11/14/17 features all this can be achieved in a much more compact and elegant way, as it was originally done in Blitz++ with just C++98 at hand.

wolfv commented 5 years ago

Dear people of Blitz++,

I am one of the core devs of xtensor, which has already been discovered in another thread. Indeed we try to have a modern (C++14) nD-array library based off of the API of NumPy. It could be very interesting to have a discussion with you, that know Blitz++ very well, to see what features in xtensor are missing, that were in Blitz++.

Maybe xtensor could be the Blitz11 you're looking for – it would definitly be cool to hear from you and collaborate as far as possible.

We're very active on our gitter channel: https://gitter.im/QuantStack/Lobby

citibeth commented 5 years ago

@wolfv Thank you for getting involved here.

Would you be able to look over the requirements listed above and evaluate Xtensor in relation to them? Ideally, Xtensor already does all these things. If not... do you see a feasible way forward to get that functionality in Xtensor?

That would be really helpful!

wolfv commented 5 years ago

Arrays are fully C++ compliant, can be held in standard C++ collections. Can be included in larger structs without having to write custom move constructors / assignment for those structs.

Yes that works with xtensor.

Array unnderlying memory is either: (a) a shared ptr, or (b) borrowed. Both variants have the same top-level type. When new arrays are constructed from an old array, the new array shares or borrows, according to how the old array did it.

We have closure semantics to either own or reference memory in views + we have the adapt functions to adapt pointers or containers.

Handles const correctly.

Yes.

Array layout is fully configurable: any base, any stride (even negative), any dimension order.

Yes we have that (i don't know what you mean with any dimension order). We have template support for row_major and column_major layout as well as any layout (0D scalar & 1D vector) and dynamic layout for anything else (e.g. views with non-contiguous strides).

Full complement of array slicing, reshaping, etc. Must include reduction of dimension.

We support reshaping + resizing. We also have a reshape_view to create a new view into an array with another shape. Furthermore there are elaborate views with many of the numpy slices, e.g. range(start, stop, step), integral slices (removes dimension), newaxis to add a empty dimension, keep(1,2,3) to select particular indices ... (views)[https://xtensor.readthedocs.io/en/latest/view.html#sliced-views]

Nice-To-Have Features

Shared memory blocks may be used, managed, loaded, saved, etc. separately from any particular Array used to access them.

The dope vector (array layout) is accessible as a first-class object. Dope vectors may be copied, manipulated and used to construct new Arrays.

Not sure I get it but you can access the strides(), shape() and size() parameter which should be all that's necessary.

Bounds checking. Controlled by a template parameter. A global (compile-time) parameter can also be used to turn off all bounds checking.

We have a compile time flag to turn on / off bounds checking.

Bounds errors reported in a way that can generate a stacktrace.

Bound errors throw an exception at the moment. Not sure if that's what you're looking for?

Allows to write code that can work with an any-dimension Array. This mode of access is not expected to be fast.

Yes, the xarray<T> container can have any dimension (just like numpy's ndarray).

Vectorized operations, lazy evaluation magic, etc. as in Eigen::Tensor, Blitz++, etc. This is a cool feature, and it is well known how to implement. But it is not essential. Initial design should be built in a way to not PRECLUDE these features from begin efficiently added in the future.

Yes, it's all there, but not everything is at the maximum efficiency yet.

Support ultra-long dimensions (>4 billion extent in a dimension)

That should work. The dimension in the case of xt::array is basically an std::vector with a small buffer optimization, the container for the shape of a xt::xtensor (with fixed dim) is a std::array, the shape container for a xt::xtensor_fixed is a compile time fixed shape.

Support inter-process shared memory arrays (eg. boost::interprocess).

No specific support for this yet.

There are some peculiarities that we support and blitz didnt (e.g. broadcasting from numpy vs. index expressions of blitz). There might be more cool things from Blitz that we don't have, so that would be cool to get to know.

wolfv commented 5 years ago

To give some more motivation in terms of interoperability: So far xtensor supports natively 4 languages – C++, Python, Julia and R in their respective packages:

xtensor-python: https://github.com/QuantStack/xtensor-python
xtensor-r: https://github.com/QuantStack/xtensor-r
xtensor-julia: https://github.com/QuantStack/xtensor-julia

This means it's easy to write a program with xtensor (using xexpressions for the interface) and reuse the same code to natively (copyless) handle data coming from Julia, Python or R. In the future we hope to add more languages – obvious targets would e.g. include Octave/Matlab

slayoo commented 5 years ago

@wolfv Big thanks for feedback here.

Some time ago I've posted a very subjective list of "key" features we've found in our team impoartant in Blitz: https://github.com/blitzpp/blitz/issues/8#issuecomment-174221798 - feedback welcome on how these compare to xtensor features.

Also, I'm working now on listing and exemplifying Blitz++ features here: https://github.com/blitzpp/blitz/wiki/Features (currently fighting with a script to retrieve all those examples from the wiki source, compile and verify if the output matches wiki content - to be put in a Travis script)

citibeth commented 5 years ago

Thank you for the detailed reply. For brevity, things that look really good/promising, I'm not replying to here, it means they look really nice to me!

Arrays are fully C++ compliant, can be held in standard C++ collections. Can be included in larger structs without having to write custom move constructors / assignment for those structs.

Yes that works with xtensor.

Array underlying memory is either: (a) a shared ptr, or (b) borrowed. Both variants have the same top-level type. When new arrays are constructed from an old array, the new array shares or borrows, according to how the old array did it.

We have closure semantics to either own or reference memory in views + we have the adapt functions to adapt pointers or containers.

OK, to get it straight... Let's say I make a function that takes an array as an argument (using Blitz notation right now): void f(Array<double,2> &a)

I would like to be able to call this function in a variety of cases:

Array<double,2> a; f(a)
Array<double,2> a; f(a(0:3,:))
Array<double,3> a; f(a(0:3, 17, :))
char *amem; f(Array<double,2>(a,{4,5}));

The expressions a(0:3,:), a(0:3,17,:), etc. must all either have the same C++ type, or be coercible to the type that f() expects. That is the shortcoming of Eigen::Tensor... case (4) (a Map<> type) requires a different signature for f() than case (1).

Does xtensor currently allow all 4 of the above kinds of expressions (and maybe more) to call f() with the same underlying signature?

Array layout is fully configurable: any base, any stride (even negative), any dimension order.

Yes we have that (i don't know what you mean with any dimension order). We have template support for row_major and column_major layout as well as any layout (0D scalar & 1D vector) and dynamic layout for anything else (e.g. views with non-contiguous strides).

"Any dimension order" is a generalization of row-major vs. column-major. (Row-major = highest stride first; column-major = lowest stride first). If you can do row-major and column-major, I see no reason not to generalize all the way...

Nice-To-Have Features

Shared memory blocks may be used, managed, loaded, saved, etc. separately from any particular Array used to access them.

Not sure I get it but you can access the strides(), shape() and size() parameter which should be all that's necessary.

Then this is not a first-class object. You can access them one value at a time; but if you want to put them in a data structure, you have to write a little function to query them, etc. I want this as a REAL first-class object that can be manipulated, etc. apart from any actual allocated array. This allows more seamless integration with other things. For example, you can have some NetCDF subroutines that take a dope vector and create a NetCDF array of that shape. You can serialize the dope vector, store it to a NetCDF descriptor variable, send it over a network, etc. --- anywhere you need to describe an array shape. You can create one dope vector and use it to instantiate 17 xtensor arrays. You can more easily do Fortran interop. etc.

Basically... anything serious for lots of purposes, you need a dope data structure. Either xtensor provides a standardized one, or we will forever be creating our own, and converting to/from it. xtensor probably already has this data structure (or something close to it) inside; so it should not be too hard to export and standardize it.

So I'd like something like this:

Array<double,2> a(3,4);
Dope<2> ad(a.dope);   ad.shape[0];  ad.stride[0]; etc...

Dope<2> bd(3,4);
Array<double,2> b(bd);   // Same shape as array a

Bounds checking. Controlled by a template parameter. A global (compile-time) parameter can also be used to turn off all bounds checking.

We have a compile time flag to turn on / off bounds checking.

Can this be done with a template parameter? The "if" statements shouldn't take any time. Compile-time flags are really crude.

Bounds errors reported in a way that can generate a stacktrace.

Bound errors throw an exception at the moment. Not sure if that's what you're looking for?

C++ exceptions unfortunately do not provide a stacktrace from the point of being thrown. You have to catch them and make your own stacktrace... and then you only get a stacktrace from the point where you caught them. You only get a stacktrace throwing an exception if you create the trace at the time of throwing. Unfortunately, that can be slow for cases where you don't need a stacktrace. It also runs into compiler dependence issues.

Best is to take a std::function<> error handler subroutine, and let the user provide the error handler. It should be a template parameter, allowing the user to provide a different error handler for different arrays. (This is possible in a large application, involving many 3d party libraries, all of which use xtensor). If the user does not provide an error handler, then a global default error handler will be called (some global variable of type std::function<>). The standard global default should throw an exception as you currently do. The user's top-level application can change that error handler if desired; but library packages meant to be linked as something larger should generally not change it.

Allows to write code that can work with an any-dimension Array. This mode of access is not expected to be fast.

Yes, the xarray<T> container can have any dimension (just like numpy's ndarray).

Combined with first-class dope vectors, this will allow things like, for example, processing arrays in an arbitrary NetCDF file.

Support ultra-long dimensions (>4 billion extent in a dimension)

That should work. The dimension in the case of xt::array is basically an std::vector with a small buffer optimization, the container for the shape of a xt::xtensor (with fixed dim) is a std::array, the shape container for a xt::xtensor_fixed is a compile time fixed shape.

I don't think that's what I meant. I meant, support arrays like: Array<double,1> arr(5billion)

There are some peculiarities that we support and blitz didnt (e.g. broadcasting from numpy vs. index expressions of blitz). There might be more cool things from Blitz that we don't have, so that would be cool to get to know.

I have never used Blitz beyond the basics. IMHO, a good, solid array library comparable to Numpy's ndarray or Fortran 90 arrays would be a good start. This stuff should really just be part of the language. Cool features can be built on top of that.

wolfv commented 5 years ago

Hi,

just to reply to your first point: Writing generic functions in xtensor is "easy" -- you'd write something like

template <class E, class F>
auto func(xexpression<E>& e)
{
     auto& de = e.derived_cast();
     return de(1) + de(2)...
}

now an xexpression can be anything: a xcontainer derived type (e.g. xarray/xtensor ...), a xview (as in your example) or a xfunction. Currently we don't have a mechanism to dispatch on dimension or shape -- this will hopefully come (an open PR exists) which would support your notation.

Checking by template parameter -- we actually provide an at() operator (borrowing from the STL here). Having a template parameter for the operator() would be annoying to type (a.template operator<check>()(1,2,3))... so .at() is the way to go IMO.

The "dope" vector are currently three distinct members in the class: shape, strides and backstrides, whereby backstrides are just a performance optimization for nD-stepping. IIRC at the moment we don't support using references for shape/strides/backstrides. But it could be something to add (as well as using the previously mentioned closure type for shape/strides). However references go out of scope, shared pointers have considerable overhead and compilers have powerful copy-elision, so owning the shape might not be super-bad.

Interesting about the stack trace. The code is all in xtensor/include/xexception.hpp if you want to have a look -- PR's or comments very welcome! It should be somewhat simple to fix as it's just one or two functions.

Sure your array size can go as far as your memory carries you -- and your container. You could, for example, use a std::vector as data container. We use slightly different container as data storage (the uvector which doesn't initialize its contents (and lacks some unnecessary functionality for its use case).

@slayoo I'll get back to you in the other thread but I'll be AFK for a couple hours!

slayoo commented 5 years ago

@wolfv thanks

citibeth commented 5 years ago

Hi,

just to reply to your first point: Writing generic functions in xtensor is "easy" -- you'd write something like
template <class E, class F>
auto func(xexpression<E>& e)
{
     auto& de = e.derived_cast();
     return de(1) + de(2)...
}
now an xexpression can be anything: a xcontainer derived type (e.g. xarray/xtensor ...), a xview (as in your example) or a xfunction. Currently we don't have a mechanism to dispatch on dimension or shape -- this will hopefully come (an open PR exists) which would support your notation.

Then I would say xtensor does not yet meet the minimum requirements provided by Fortran 90. It should be entirely natural to do something like:

void f(Array<double,2> &a);
Array<double,2> a; f(a(0:3,:))

Checking by template parameter -- we actually provide an at() operator (borrowing from the STL here). Having a template parameter for the operator() would be annoying to type (a.template operator<check>()(1,2,3))... so .at() is the way to go IMO.

Again... users should be able to declare variables of type "Array", write functions that take something of type "Array", and then call one with the other. Users should not have to remember that "this is a generic function" (meaning it can take array experssions), and therefore I need to use .at() for it. That seems like something went askew in the design somewhere.

IMHO, getting basic arrays right is more important than lots of cool array expression magic.

The "dope" vector are currently three distinct members in the class: shape, strides and backstrides, whereby backstrides are just a performance optimization for nD-stepping. IIRC at the moment we don't support using references for shape/strides/backstrides. But it could be something to add (as well as using the previously mentioned closure type for shape/strides). However references go out of scope, shared pointers have considerable overhead and compilers have powerful copy-elision, so owning the shape might not be super-bad.

Dope vectors are small, and should not be referenced between arrays; that's more complication than its worth. When you create an array, xtensor should copy the dope vector you provide.

backstrides should not be part of a dope vector, that should be internal to xtensor. Sounds like a dat a structure with shape and strides should be exported.

Sure your array size can go as far as your memory carries you -- and your container. You could, for example, use a std::vector as data container. We use slightly different container as data storage (the uvector which doesn't initialize its contents (and lacks some unnecessary functionality for its use case).

More specifically... does xtensor support 64-bit indexes?

slayoo commented 5 years ago

BTW, anyone has experience with libdynnd: http://libdynd.org/ ?

wolfv commented 5 years ago

Then I would say xtensor does not yet meet the minimum requirements provided by Fortran 90. It should be entirely natural to do something like:

void f(Array<double,2> &a); Array<double,2> a; f(a(0:3,:))

Okay, I think the blitz++ design is different here, where an array is a view at the same time, and usually references memory held in a shared memory block. In xtensor, views and containers are currently distinct and in order to not copy data one needs to use a generic base type (xexpression). But the effect is the same, and the API is equal on all derived classes. I have to admit that, if we redesign xtensor at some point, we might want to base all containers off of a strided view class which could make a lot of sense.

Again... users should be able to declare variables of type "Array", write functions that take something of type "Array", and then call one with the other. Users should not have to remember that "this is a generic function" (meaning it can take array experssions), and therefore I need to use .at() for it. That seems like something went askew in the design somewhere.

IMHO, getting basic arrays right is more important than lots of cool array expression magic.

I think I partially explained it above. Indeed one can write functions that take xarrays but, when calling this function e.g. with a xfunction the function will be evaluated. E.g. this perfectly works:

auto func(const xarray<double>& a)
{
    return a * 2;
}

func({1,2,3,4});
func(arr1 + arr2 * 123); // where arr1 and arr2 are xarrays.

However, if I would write a generic function with xexpressions, the last call would not get evaluated and the return of func would be something like (abbreviated for readability) xfunc<arr1 + xfunc<arr2 * xfunc<123 * xfunc<2>>>>.

So in a way, xexpression is our Array. Is it clear what I mean or not so much? Or maybe you can show me the use case you worry about.

Dope vectors are small, and should not be referenced between arrays; that's more complication than its worth. When you create an array, xtensor should copy the dope vector you provide. backstrides should not be part of a dope vector, that should be internal to xtensor. Sounds like a dat a structure with shape and strides should be exported.

So the discussion is wether we should have some struct called dope that contains shape and strides? E.g.

template <size_t N>
struct dope
{
    std::array<size_t, N> shape;
    std::array<ptrdiff_t, N> strides;
}

I don't see a clear benefit of that right now. Just note that you can template your xcontainer with any shape or strides type, as, for example done for Python arrays. We do not copy the shape or strides in that case and directly use the same memory for shape and strides as numpy.

More specifically... does xtensor support 64-bit indexes?

Yes, we try to be as STL as possible -- our indices are therefore size_type which is usually an unsigned 64 bit type.

wolfv commented 5 years ago

@slayoo I don't have a lot of experience with libdynd. we have talked with one of the authors a couple of weeks ago. At the moment there is not much activity, and at least the docs are quite lacking. However, he might pick it up again for some project.

emmenlau commented 5 years ago

I very much enjoyed this discussion and wanted to kindly ask whether there new developments or considerations?

On a related note, I'm curious to understand how well xtensor is optimized for performance. Some time ago I added xtensor to blazemark to better understand the typical performance, and found that basic operations like adding two arrays can be on the order of 5 times slower than in blitz++ or Eigen::Tensor. This was no professional comparison and is quite subjective, of course. But just coincidentally there was recently an issue about iterator performance where early versions where prohibitively slow (https://github.com/QuantStack/xtensor/issues/695). I'm not trying to get into premature optimization here, but as @tobias-loew pointed out many C++ users may care about speed, and on top of good syntax and a suitable set of features it can still be significant work to come close to the incredible performance of blaze, blitz++ or Eigen::Tensor.

Would it be possible to add some comparable libraries like blitz++ or Eigen::Tensor to your benchmark to get a better understanding where xtensor stands, and what parts might need more work?

wolfv commented 5 years ago

Hi @emmenlau we do have a benchmark suite that compares against several other libraries here: https://github.com/wolfv/xtensor-benchmark It might need a little bit of love to get it to compile with the newest xtensor release.

The issue with the iterators is from "back in the days". We've spent a lot of work on getting the performance right. For basic operatiions it should only be slower for very small matrices and non-static shapes because in that case we need to perform a dynamic check to see if broadcasting is involved. Additionally we also spun out a companion library (https://github.com/QuantStack/xsimd) which is one of the nicest explicit SIMDification libraries around :)

However, we also still have some speed improvements to do, and some are quite low-hanging fruits. But the general use case should be fine!

I couldn't find your benchmarks on a quick google, but make sure to activate the -DXTENSOR_USE_XSIMD flag and install xsimd as well.

wolfv commented 5 years ago

btw seeing your HighFive contributions, we actually use HighFive as part of xtensor-io (github.com/QuantStack/xtensor-io) that one of our core contributors (@tdegeus) has added (and also backported some cool changes to HighFive).

emmenlau commented 5 years ago

Dear @wolfv , thanks for the pointer to your benchmark! I'm quite curious to see it in action! Does it generate some plots, and could you integrate it with your CI? It would be very interesting to continuously review and extend this. I see also that you've hardcoded some xsimd benchmarks. Am I correct to assume that xsimd is generally hidden from the user inside xtensor, and things like SIMD, BLAS and parallelization "just work" when built in?

We have been following xtensor for a while and it seems quite interesting. Specifically the many language bindings and the good integration of iterators is nice. I do share some of the sentiments mentioned above for parts of the syntax. I.e. the at()-method is not my favorite, and (more importantly) I think the code would be more readable if the base view class would not be named xexpression but rather xtensor, xview or something closely related to these.

But generally xtensor seems very promising and you've achieved a lot, so two thumbs up from my side.