blitzpp / blitz

Blitz++ Multi-Dimensional Array Library for C++
https://github.com/blitzpp/blitz/wiki
Other
406 stars 84 forks source link

Keep Blitz++ Around #8

Closed citibeth closed 4 years ago

citibeth commented 8 years ago

Thank you all for your input on this issue. After much consideration, I think that Blitz++ is worth keeping around. Even in 2016, it remains unique in concept and execution.

Blitz++ claims to do one thing only: provide multi-dimensional arrays for C++, similar to those in Fortran 90, Python/Numpy, and APL. Notice that Fortran 2008 has this feature but nothing like STL, and is widely used for scientific computing. It is clear that multi-dim arrays are essential in a way that STL is not. Of course, a system offering both is even more useful...

It is therefore disappointing that such arrays are still not part of the C++ standard. This is after 15 years of work on Blitz++, boost::multi_array, etc. Without any standard, different libraries use different classes for the same data structure, limiting compatibility between them. Interoperability is cumbersome, and often requires unnecessary copying of multi-dimensional data structures (assuming there's enough RAM).

Many of us confuse multi-dimensional arrays with matrices and vectors. Matrices and vectors are essential to scientific computing, and may be the most common type of arrays used. However, they do not take the place of the more general multi-dimensional array. This becomes clear to anyone attempting to read 5-dimensional data out of a HDF/NetCDF file. In an ideal world, matrix and vector classes (and operations) would be built upon a foundation of rank 1 and 2 arrays, thereby enhancing compatibility between different linear algebra libraries.

By offering only multi-dimensional arrays, Blitz++ provides a foundation upon which linear algebra and other libraries may be built. The fact that none have taken advantage of this foundation seems to be a missed opportunity. However... until/unless something better comes along, I believe we should continue to support Blitz++.

Why is Blitz++ so good?

  1. It works, it's stable, it's well-tested. There's really nothing wrong with it. Sometimes, software is updated infrequently because it's not buggy. That's a reason to KEEP it, not throw it away.
  2. It's well documented. The manual may feel long in the tooth compared to today's manuals. But the information you need is all there, and it works.
  3. It does one thing well and has no dependencies. I don't have to link to BLAS in order to read a NetCDF file with Blitz++.
  4. It's versatile. Blitz++ offers the full functionality of Fortran 90 arrays, and is able to use arrays in memory allocated by others. There are no restrictions on the dope vectors it allows. It is therefore REALLY useful for interfacing Fortran and Python code with C++. And since any array-like data structure can be converted (copy-free) to a blitz::Array, writing your functions to take blitz::Array parameters is an easy way to make them flexible as well.
  5. It's FAST. Benchmarks have shown it to be about as fast as Fortran 90 arrays.

In my review, I found only two other serious efforts to offer multi-dimensional arrays for C++. Both have potentially serious problems:

"I don't know what the language of the year 2000 will look like, but I know it will be called Fortran." -- Tony Hoare [CarHoare], apparently on a card distributed during the 1982 AFIPS National Computing Conference.

A number of other libraries exist that provide specialized matrix and vector classes. I do not need to mention them here because they don't provide multi-dimensional arrays.


I propose the following roadmap for the continued viability of Blitz++:

`. We get volunteers. Please contact me if you are interested in any of the tasks listed below.

  1. We move the repo to github.com/oonumerics/blitz. In the past, Blitz++ was hosted by oonumerics. Once this new repo is set up, we remove Blitz++ code from SourceForge and direct people to GitHub. We also move the mailing list and try to move the mailing list archives, if possible.
  2. Once things are moved, we can start putting updates into a ticket system. I would suggest the following priorities:
  3. Reformatted documentation, posted on-line. Doxygen docs would be nice.
  4. Identify and seek resolution on any warnings Blitz++ might be causing with modern compilers. We need to assure users that Blitz++ won't just stop working some day.
  5. Release the results of this as "Blitz 1.0"

We then start working on Blitz 2.0, which will make use of features in C++11 and beyond. C++98 users can continue to use Blitz 1.0. Updates might include:

  1. Replace TinyVector with std::array 1/ Much has changed now, with std::unique_ptr<> and std::shared_ptr<> now being part of the C++ standard. It would be worth seeing if some of the shared_ptr functionality currently built into Blitz++ is worth factoring out.
  2. Consider adding new functionality that will make blitz::Array more useful out-of-the-box. I'm envisioning some stuff I've already written as little "utility" functions, but we'd want to do it right before putting it into the library. Possibilities include:
  3. A reshape() function.
  4. Easy ways to read/write to NetCDF files. (This would ONLY be compiled if Blitz++ is configured with NetCDF dependency. We don't want to add any REQUIRED dependencies).
  5. Copy-free conversions between blitz::Array and std::vector.
  6. Conversions between blitz::Array, Numpy arrays and Fortran 90 arrays.
  7. In any case, please share your comments and suggestions on this roadmap. (Ane please volunteer too!)
slayoo commented 8 years ago

Probably somehow relevant - some time ago (on the occasion of the 2015 C++now conference) I've prepared a list of minimal code samples that make use of Blitz++ features that we've found in our team missing in other libraries/languages. It would be great to hear if those features are indeed unique or not (I guess they should be somehow obtainable with eigen::Tensor???).

1. overloadable OO index arithmetics (useful for implementing fractional indices)

#include <blitz/array.h>                                                    

struct hlf_t {} h;                                                          

blitz::Range operator+( const blitz::Range &i, hlf_t &)                     
{ return i;  }                                                              

blitz::Range operator-( const blitz::Range &i, hlf_t &)                     
{ return i-1; }                                                             

blitz::Range operator^( const blitz::Range &r, hlf_t &n)                    
{                                                                           
  return blitz::Range(                                                      
    (r - n).first(),                                                        
    (r + n).last()                                                          
  );                                                                        
}                                                                           

int main()                                                                  
{                                                                           
  blitz::Range i(0,10);                                                     
  blitz::Array<float, 1> psi(i), phi(i^h);                                  

  psi(i) = ( phi(i-h) + phi(i+h) ) / 2;                                     
}

2. multi-dimensional indexing with single objects (e.g. easy to return index permutation from a function)

#include <blitz/array.h>                                                    

int main()                                                                  
{                                                                           
  blitz::Range i(0,9), j(0,9), k(0,9);                                      

  blitz::Array<int, 3> psi(i,j,k);                                          

  std::cerr << psi(i,j,k);                                                  

  blitz::RectDomain<3> ijk({i,j,k});                                        

  std::cerr << psi(ijk);                                                    
}         

3. tensor notation

#include <blitz/array.h>                                                    

int main()                                                                  
{                                                                           
  blitz::Array<float,2> psi(4,4);                                           
  {                                                                         
    using namespace blitz::tensor;                                          
    psi = sqrt(i*i + j*j);                                                  
  }                                                                         
  std::cout << psi;                                                         
} 

4. array-valued functions (with reference counting, optionally thread-safe)

//#define BZ_THREADSAFE // to enable access locks for ref counting          
#include <blitz/array.h>                                                    

enum { opt_sin, opt_cos };                                                  

template <int opt, typename T>                                              
auto func(T &a, typename std::enable_if<opt == opt_sin>::type* = 0)         
{                                                                           
  return safeToReturn(pow(sin(a), 2)); // keeps ref count                   
}                                                                           

template <int opt, typename T>                                              
auto func(T &a, typename std::enable_if<(opt == opt_cos)>::type* = 0)       
{                                                                           
  return safeToReturn(pow(cos(a), 2)); // keeps ref count                   
}                                                                           

int main()                                                                  
{                                                                           
  blitz::Array<float, 1> psi(1000);                                         
  {                                                                         
    using namespace blitz::tensor;                                          
    psi = i / 20.;                                                          
  }                                                                         
  psi = func<opt_sin>(psi) + func<opt_cos>(psi);                            
  std::cout << psi;                                                         
}   

5. elemental functors

#include <blitz/array.h>                                                    

struct func                                                                 
{                                                                           
  float a = .25, b = .1;                                                    

  float operator()(float x) const                                           
  {                                                                         
    return a * x + b;                                                       
  }                                                                         

  // to make it accept Blitz arrays as arguments                            
  BZ_DECLARE_FUNCTOR(func);                                                 
};                                                                          

int main()                                                                  
{                                                                           
  blitz::Array<float, 1> psi(10), x(10);                                    

  x = blitz::tensor::i;                                                     

  psi = func()(x);                                                          
}

6. ternary operator & reductions (with lazy/delayed evaluation!)

#include <blitz/array.h>                                                    

int main()                                                                  
{                                                                           
  blitz::Array<float, 2> psi(5,5);                                          
  {                                                                         
    using namespace blitz::tensor;                                          
    psi = sqrt((i*i) + (j*j));                                              
  }                                                                         

  // ternary operator                                                       
  psi = where(psi>3, 0, psi);                                               

  // partial reduction                                                      
  {                                                                         
    using namespace blitz::tensor;                                          
    auto xpr = sum(psi, j); // delayed eval!                                
  }                                                                         
}  

Comments welcome...

slayoo commented 8 years ago

Re "It is therefore disappointing that such arrays are still not part of the C++ standard.", this open-std.org proposal is relevant:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3851.pdf

slayoo commented 6 years ago

For the record, I'm posting here replies from Eigen developers: Benoit and Gael (both from January 2016 - things might have changed!):

Re 1.

This sounds as an odd syntactic sugar to me. I don't think that Eigen::Tensor allows that.
The tensor module doesn't support this, but there could be another way to achieve you goal.

Re 2.

Not in Eigen::Array, maybe in Eigen::Tensor??
We support permuting the dimensions using the shuffling operation. Is this what you're looking for?

Re 3.

https://bitbucket.org/eigen/eigen/pull-requests/124/einstein-notation-for-tensor-module/diff

Re 4.

That's a planed feature to allow functions to return an expression referencing a temporary, but in your example I don't see any temporary. In Eigen you could simply return:

    return pow(cos(a), 2);

without trouble in this case. Reference counting would be needed for explicit temporaries, like:

    ArrayXXf tmp = ...;
    return pow(cos(tmp),2); // returned as expression

The same apply to Eigen::Tensor.

Indeed. In the tensor module, you would use unaryExpr you call the cosine function.

Re 5.

We rather offer an API compatible with C++11 lambda, and C++11 functions in general, e.g.:

      std::default_random_engine generator;
      std::poisson_distribution<int> distribution(4.1);
      auto poisson = [&] (Eigen::Index) {return distribution(generator);};
      RowVectorXi v = RowVectorXi::NullaryExpr(10, poisson );

The same apply to Eigen::Tensor.

Indeed, the tensor module supports nullaryExpr.

Re 6.

    psi = (psi>3).select(0,psi);
    auto partial_redux = psi.colwise().sum();
    auto custom_partial_redux = psi.rowwise().redux([] (float a, float b) {return hypot(a,b);});

The same apply to Eigen::Tensor.

The tensor module supports the ternary operator, it's called select. We also support partial and full reductions.
burnpanck commented 6 years ago

I have been a blitz++ user many years ago, but nowadays I do numerics mostly in python/numpy. Having worked with Matlab before, I like the consistent approach to multi-dimensional arrays of numpy compared to Matlab, and similarly of blitz++ vs. e.g. Eigen. I'm now coming back to C++ and thus looking for numpy-like libraries that can support the many new features C++17 (ok, it's mostly C++11 really) brings. It's nice to see that there is interest in the blitz++ community to support that. However, I want to point out that there seems to be a new kid around on the block: https://github.com/QuantStack/xtensor (I'm not affiliated with them).

citibeth commented 6 years ago

Thank you this looks quite promising at initial glance

On Fri, Aug 10, 2018 at 3:42 PM Yves Delley notifications@github.com wrote:

I have been a blitz++ user many years ago, but nowadays I do numerics mostly in python/numpy. Having worked with Matlab before, I like the consistent approach to multi-dimensional arrays of numpy compared to Matlab, and similarly of blitz++ vs. e.g. Eigen. I'm now coming back to C++ and thus looking for numpy-like libraries that can support the many new features C++17 (ok, it's mostly C++11 really) brings. It's nice to see that there is interest in the blitz++ community to support that. However, I want to point out that there seems to be a new kid around on the block: https://github.com/QuantStack/xtensor (I'm not affiliated with them).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/blitzpp/blitz/issues/8#issuecomment-412200237, or mute the thread https://github.com/notifications/unsubscribe-auth/AB1cd517mZj-4HFSKENJe74vTdP3fpuvks5uPfAZgaJpZM4HH67H .

slayoo commented 6 years ago

Thanks indeed! Just added to the "Alternativs" wiki page: https://github.com/blitzpp/blitz/wiki/Alternatives Please post here or there any other packages that might be of relevance. Thanks

wolfv commented 6 years ago

Hi @slayoo

one thing xtensor does not support at the moment is Blitz++ fancy indexing. So in order to recreate example 1, one would need to rely on broadcasting or for loops. We also don't support index offsets like that in our arrays. However with the strided_view in xtensor, there is a offset variable, and with some tricks it could probably be used for an effect like this.

You can index with a index type using the square bracket operator in xtensor, e.g. arr[{1,3,2}]. One could also return a array or tuple of ranges and stick them into a view, e.g. std::array<xt::range<int>, 3>(range(0, 10), range(0, 10)...) and then unpack into a view view(a, r[0], r[1], r[2]) for example. With a bit of metaprogramming this could be turned into a function that does that. However, we haven't yet explored having a range container like RectDomain in xtensor.

We do have a slice_vector based on std::variant though, which can hold any number of slice objects such as range/integer/all, in order to dynamically create views.

We don't support the tensor notation. It is something I am interested in, though, and we did some experiments in an old PR that was abandoned. However, many things can be done using broadcasting and arange.

To adapt your example:

auto r1 = xt::arange(4);
auto r2 = xt::reshape_view(xt::arange(4), {4, 1});
xarray<double> psi = sqrt(r1 * r1 + r2 * r2);

I'm not sure I get #4 -- we have shared expressions which are reference counted using a shared_ptr, if that is what you're after. Memory management in xexpression can be indeed a little tough sometimes, and shared expressions are one way to get around it (e.g. when referencing a temporary in multiple places it's not possible to std::move it into the expression). But in your example I don't see the need for reference counting.

5 -- you can create broadcasted functions from lambdas quite easily:

template <class E>
auto func(E&& expr)
{
    return make_lambda_xfunction([](auto x) {
        return 0.25f * x + 0.1f;
    }),
    std::forward<E>(expr));
}

We have a lazy ternary operator (it's called where in numpy lingo)

xarray<double> res = where(a > 3, a, b);

and lazy reducers, e.g. sum: auto xpr = sum(arr, {axis1, axis2 ... })

Hope that answers some questions! As stated on the other thread, we're almost always available on our gitter channel: gitter.im/QuantStack/Lobby

slayoo commented 6 years ago

@wolfv Thank you! I'll reply later this weekend. BTW, all these examples are taken from http://www.igf.fuw.edu.pl/~slayoo/10.3233-SPR-140379.pdf where the same concepts are also shown for Fortran and NumPy.

slayoo commented 4 years ago

closing, let's open new issues if something needs to be addressed