blitzpp / blitz

Blitz++ Multi-Dimensional Array Library for C++
https://github.com/blitzpp/blitz/wiki
Other
404 stars 83 forks source link

[question/request] scientific papers and or research about blitz++ #119

Closed ClmnsRck closed 5 years ago

ClmnsRck commented 5 years ago

Are there papers about blitz? Like talking about the framework, some guidelines and features? Something that is in theory citable in a scientific work?

citibeth commented 5 years ago

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C21&q=blitz%2B%2B&btnG=

On Tue, Apr 23, 2019 at 11:50 AM ClmnsRck notifications@github.com wrote:

Are there papers about blitz? Like talking about the framework, some guidelines and features? Something that is in theory citable in a scientific work?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/blitzpp/blitz/issues/119, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOVY54TY54X244DBTFFSYLPR4V4VANCNFSM4HHZJGUA .

ClmnsRck commented 5 years ago

Sadly those are either user-manuals(hard to cite) or behind a hefty paywall. too bad

ClmnsRck commented 5 years ago

@citibeth If you have the time, i would like to ask you some questions about blitz++(not going to quote you on anything or anything like that, just for me and my limited knowlegde of C++ and Blitz++) You told me in another issue, that Blitz is mostly suited to be a smart and efficient storage structure for multidimensional arrays, and not really meant to be used as a tensor-algebra library. Did i get that right, or am i wrong about something? :smiley:

citibeth commented 5 years ago

Yes that's right.

Are you writing a paper on this stuff?

-- Elizabeth http://ccsr.columbia.edu/who-we-are/our-scientists/elizabeth-fischer/

On Tue, Apr 23, 2019 at 12:00 PM ClmnsRck notifications@github.com wrote:

@citibeth https://github.com/citibeth If you have the time, i would like to ask you some questions about blitz++(not going to quote you on anything or anything like that, just for me and my limited knowlegde of C++ and Blitz++) You told me in another issue, that Blitz is mostly suited to be a smart and efficient storage structure for multidimensional arrays, and not really meant to be used as a tensor-algebra library. Did i get that right, or am i wrong about something? 😃

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blitzpp/blitz/issues/119#issuecomment-485866877, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOVY57KTKB365IXZCASK73PR4XB7ANCNFSM4HHZJGUA .

ClmnsRck commented 5 years ago

More or less, i am currently writing my bachelors thesis on the comparative evaluation of tensor algebra libraries for database systems. Thats why i need to test alot of operations with only built in features or standard-c++-features.

citibeth commented 5 years ago

Clemens,

I would say that Blitz++ is not a tensor algebra library. I see it as basically giving what Fortran90 gives in terms of multi-dimensional arrays.

What is your undergraduate institution? Most should provide paywall access to journals.

-- Elizabeth

On Tue, Apr 23, 2019 at 12:07 PM ClmnsRck notifications@github.com wrote:

More or less, i am currently writing my bachelors thesis on the comparative evaluation of tensor algebra libraries for database systems. Thats why i need to test alot of operations with only built in features or standard-c++-features.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blitzpp/blitz/issues/119#issuecomment-485869456, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOVY5ZR6T7KJEIM6DE4NJLPR4X2FANCNFSM4HHZJGUA .

ClmnsRck commented 5 years ago

I know that, it still popped up on my radar, so i devoted some time to it. Even if it just was to confirm, that blitz++ should not be used as such.

citibeth commented 5 years ago

In an ideal world, Blitz++ would be a multi-dimensional array library that you build algorithms on top of.

On Tue, Apr 23, 2019 at 12:13 PM ClmnsRck notifications@github.com wrote:

I know that, it still popped up on my radar, so i devoted some time to it. Even if it just was to confirm, that blitz++ should not be used as such.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/blitzpp/blitz/issues/119#issuecomment-485871963, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOVY52VCAQZ4UXYBNRULMTPR4YTFANCNFSM4HHZJGUA .

lutorm commented 5 years ago

Blitz is a great alternative for evaluating dense matrix tensor expressions. What it does not have are linear algebra functions like inverses, eigenvalues, etc. So it depends on what you're needing to do.

ClmnsRck commented 5 years ago

Dont get this the wrong way, this is definitely a question and not something else, but would it be hard to implement a semi-efficient Tensor-product into Blitz++? And linking to BLAS-subroutines for Matrix and vector algebra?

EDIT: That would enable inverses, eigenvalues,etc. wouldnt it?

citibeth commented 5 years ago

Clemens,

On Tue, Apr 23, 2019 at 12:58 PM ClmnsRck notifications@github.com wrote:

Dont get this the wrong way, this is definitely a question and not something else, but would it be hard to implement a semi-efficient Tensor-product into Blitz++? And linking to BLAS-subroutines for Matrix and vector algebra?

I'm assuming all tensor products we're talking about use the same basic algorithm. If that's the case, efficiency will depend on patterns of memory access. Blitz++ can support any memory access you like. I think the main issue would be whether a "simple" implementation (i.e. nested loops, and using Blitz++ indexing) is efficient; or whether you need to do some kind of clever iterator.

And of course... the same library will have very different performance depending on how you structure the tensors you're multiplying. When comparing between libraries, you should make sure the examples lay out data in the same way in memory.

Yes there's no reason you can't call BLAS on a Blitz++ array. That would be a very good way to do things, IMHO. The only limitation is, BLAS normally assumes arrays are contiguous in memory, whereas Blitz++ arrays can be much more flexible. However, for best performance you're probably best off with the contiguous arrays anyway.

-- Elizabeth

lutorm commented 5 years ago

Well, tensor products are pretty efficient as they are, using the index placeholders and reductions. Linking to BLAS for operations that aren't in blitz is a perfectly normal thing to do. (And doing that will give you a better appreciation for how nice Blitz is to work with! ;-)

As for the citation, this is what I used in my thesis:

@InProceedings{blitz, author = {Veldhuizen, T. L. }, title = {Arrays in Blitz++}, booktitle = {Proceedings of the 2nd International Scientific Computing in Object Oriented Parallel Environments (ISCOPE'98)}, year = 1998 }

On Tue, Apr 23, 2019 at 6:58 AM ClmnsRck notifications@github.com wrote:

Dont get this the wrong way, this is definitely a question and not something else, but would it be hard to implement a semi-efficient Tensor-product into Blitz++? And linking to BLAS-subroutines for Matrix and vector algebra?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blitzpp/blitz/issues/119#issuecomment-485888354, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJLYGGERXPXU6VHU2XBR4LPR454FANCNFSM4HHZJGUA .

ClmnsRck commented 5 years ago

First of all, thank you for your plenteous responses. Secondly, how would you implement a tensor-dot-product using blitz? I always used sum(A(i,k) * B(k,j),k) but that introduces the indirection of having to calculate the Multiplication first and then having to sum over the k-th Dimension ... @citibeth recommended trying the "naive" approach and doing manual for-loops over the array. I think i don't really understand the merits of blitz++, but how would the efficient memory-usage and memory-layout be of any help, if its contiguous memory anyways? (I really don't know, so please explain it, if i made some substantial mistakes/errors)

lutorm commented 5 years ago

"how would you implement a tensor-dot-product using blitz? I always used sum(A(i,k) * B(k,j),k) but that introduces the indirection of having to calculate the Multiplication first and then having to sum over the k-th Dimension

I'm not sure what you mean by "indirection". Since each component in the result, by definition, is a sum of a set of products over the reduced index, how would you not multiply before summing? The blitz expression templates ensure that what is calculated is exactly what's needed without intermediate storage. It will reduce to what you would write by hand, in pseudocode:

for i { for j { sum = 0; for k { sum = sum + A(i,k)*B(k,j); } result(i,j) = sum; } }

On Tue, Apr 23, 2019 at 7:21 AM ClmnsRck notifications@github.com wrote:

First of all, thank you for your plenteous responses. Secondly, how would you implement a tensor-dot-product using blitz? I always used sum(A(i,k)

  • B(k,j),k) but that introduces the indirection of having to calculate the Multiplication first and then having to sum over the k-th Dimension ... @citibeth https://github.com/citibeth recommended trying the "naive" approach and doing manual for-loops over the array. I think i didn't really understand the merits of blitz++, but how would the efficient memory-usage and memory-layout be of any help, if its contiguous memory anyways? (I really don't know, so please explain it, if i made some substantial mistakes/errors)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blitzpp/blitz/issues/119#issuecomment-485896501, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJLYGBYLCN7DDSKXW7YK33PR5ASTANCNFSM4HHZJGUA .

ClmnsRck commented 5 years ago

ok, then i got that wrong, thats actually good. Thank you :+1:

citibeth commented 5 years ago

Secondly, how would you implement a tensor-dot-product using blitz?

I would write a nested loop. Simple and stupid. And in 95% of cases where performance doesn't matter, I would never bother profiling it. In the last 5% of cases, I would try different things, understand how the compiler is optimizing, etc. I'd try directly indexing the Blitz++ arrays, vs. setting up pointers in memory that get incremented at appropriate times by skip values. My guess is there won't be much difference between the two on a modern optimizing compiler.

I think i didn't really understand the merits of blitz++,

The main merit of Blitz++ is it gives you a multi-dimensional array type in C++; whereas the core C++ language doesn't have that at all. And it provides enough flexibility with this data type to do useful things like interface directly with Fortran and Numpy, without copying large arrays of data when crossing the language boundary. That is... you can create a Blitz++ array using existing data in memory, if you know its layout.

To the best of my knowledge, no other C++ library provides this functionality. I know that Eigen does not.

Once you have your data in memory, if you need to do tensor products, you can use whatever algorithm you think best. That's not a core part of what Blitz++ is about. It's like asking what's the best way to multiply tensors with Fortran 90 arrays. Fortran 90 arrays don't provide a tensor multiply command. But given two Fortran 90 arrays, you can certainly think of clever ways to do the multiplication.

Blitz++ went to a lot of effort to create these array operators, which are convenient. But simply providing multi-dimensional arrays (which Blitz++ also does) is more important. I could do without the former (I'd have to write all loops out myself); but not the latter.

but how would the efficient memory-usage and memory-layout be of any help,

if its contiguous memory anyways?

Performance of HPC code on modern computers is determined primarily by memory layout. Because memory is fetched in blocks, and it takes about 100X as long as anything else happening within the CPU. Differences in compiler optimization, or library used, have a relatively small effect.

https://www.google.com/search?q=memory+layout+efficiency+tensor+operations&oq=memory+layout+efficiency+tensor+operations&aqs=chrome..69i57.5745j0j7&sourceid=chrome&ie=UTF-8

https://stackoverflow.com/questions/44774234/why-tensorflow-uses-channel-last-ordering-instead-of-row-major

slayoo commented 5 years ago

Here's a list of papers on Blitz++ on the Blitz wiki: https://github.com/blitzpp/blitz/wiki/Mentions-of-Blitz

ClmnsRck commented 5 years ago

Ok, thats actually very important to know. Thank you again for theses answers, it definitely helped me alot. I will dig deeper into what you mentioned. :+1:

lutorm commented 5 years ago

"Blitz++ went to a lot of effort to create these array operators, which areconvenient. But simply providing multi-dimensional arrays (which Blitz++also does) is more important. I could do without the former (I'd have towrite all loops out myself); but not the latter."

Funny, my thinking is completely opposite. Writing a multi-dimensional array class without operators is trivial, you could do it in an afternoon. But if you have to interact with it as if it was Fortran, you might as well use Fortran... The big advantage of blitz is that you get expressions that are actually readable while getting code that's as fast as if you wrote those loops.

It's precisely because memory bandwidth is the limit that you have to elide the temporary objects you get with a trivial implementation of the math operators. To get good performance, you're then left with either writing explicit loops, which is error prone and high maintenance, or using something like Blitz's expression templates. (And if you want to get really good performance, you need to use something like BLAS which has tuned blocking for different architectures, etc.)

On Tue, Apr 23, 2019 at 7:34 AM Elizabeth Fischer notifications@github.com wrote:

Secondly, how would you implement a tensor-dot-product using blitz?

I would write a nested loop. Simple and stupid. And in 95% of cases where performance doesn't matter, I would never bother profiling it. In the last 5% of cases, I would try different things, understand how the compiler is optimizing, etc. I'd try directly indexing the Blitz++ arrays, vs. setting up pointers in memory that get incremented at appropriate times by skip values. My guess is there won't be much difference between the two on a modern optimizing compiler.

I think i didn't really understand the merits of blitz++,

The main merit of Blitz++ is it gives you a multi-dimensional array type in C++; whereas the core C++ language doesn't have that at all. And it provides enough flexibility with this data type to do useful things like interface directly with Fortran and Numpy, without copying large arrays of data when crossing the language boundary. That is... you can create a Blitz++ array using existing data in memory, if you know its layout.

To the best of my knowledge, no other C++ library provides this functionality. I know that Eigen does not.

Once you have your data in memory, if you need to do tensor products, you can use whatever algorithm you think best. That's not a core part of what Blitz++ is about. It's like asking what's the best way to multiply tensors with Fortran 90 arrays. Fortran 90 arrays don't provide a tensor multiply command. But given two Fortran 90 arrays, you can certainly think of clever ways to do the multiplication.

Blitz++ went to a lot of effort to create these array operators, which are convenient. But simply providing multi-dimensional arrays (which Blitz++ also does) is more important. I could do without the former (I'd have to write all loops out myself); but not the latter.

but how would the efficient memory-usage and memory-layout be of any help,

if its contiguous memory anyways?

Performance of HPC code on modern computers is determined primarily by memory layout. Because memory is fetched in blocks, and it takes about 100X as long as anything else happening within the CPU. Differences in compiler optimization, or library used, have a relatively small effect.

https://www.google.com/search?q=memory+layout+efficiency+tensor+operations&oq=memory+layout+efficiency+tensor+operations&aqs=chrome..69i57.5745j0j7&sourceid=chrome&ie=UTF-8

https://stackoverflow.com/questions/44774234/why-tensorflow-uses-channel-last-ordering-instead-of-row-major

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blitzpp/blitz/issues/119#issuecomment-485900945, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJLYGG6SDIXF2K3GG3SXZ3PR5CAVANCNFSM4HHZJGUA .

lutorm commented 5 years ago

Also, it's important to note that IF you're using blitz, explicit loops over array indices are likely to be slower than using the expressions, because the expressions use stack traversals whereas array indices require a dot product every time.

On Tue, Apr 23, 2019 at 8:54 AM Patrik Jonsson code@familjenjonsson.org wrote:

"Blitz++ went to a lot of effort to create these array operators, which areconvenient. But simply providing multi-dimensional arrays (which Blitz++also does) is more important. I could do without the former (I'd have towrite all loops out myself); but not the latter."

Funny, my thinking is completely opposite. Writing a multi-dimensional array class without operators is trivial, you could do it in an afternoon. But if you have to interact with it as if it was Fortran, you might as well use Fortran... The big advantage of blitz is that you get expressions that are actually readable while getting code that's as fast as if you wrote those loops.

It's precisely because memory bandwidth is the limit that you have to elide the temporary objects you get with a trivial implementation of the math operators. To get good performance, you're then left with either writing explicit loops, which is error prone and high maintenance, or using something like Blitz's expression templates. (And if you want to get really good performance, you need to use something like BLAS which has tuned blocking for different architectures, etc.)

On Tue, Apr 23, 2019 at 7:34 AM Elizabeth Fischer < notifications@github.com> wrote:

Secondly, how would you implement a tensor-dot-product using blitz?

I would write a nested loop. Simple and stupid. And in 95% of cases where performance doesn't matter, I would never bother profiling it. In the last 5% of cases, I would try different things, understand how the compiler is optimizing, etc. I'd try directly indexing the Blitz++ arrays, vs. setting up pointers in memory that get incremented at appropriate times by skip values. My guess is there won't be much difference between the two on a modern optimizing compiler.

I think i didn't really understand the merits of blitz++,

The main merit of Blitz++ is it gives you a multi-dimensional array type in C++; whereas the core C++ language doesn't have that at all. And it provides enough flexibility with this data type to do useful things like interface directly with Fortran and Numpy, without copying large arrays of data when crossing the language boundary. That is... you can create a Blitz++ array using existing data in memory, if you know its layout.

To the best of my knowledge, no other C++ library provides this functionality. I know that Eigen does not.

Once you have your data in memory, if you need to do tensor products, you can use whatever algorithm you think best. That's not a core part of what Blitz++ is about. It's like asking what's the best way to multiply tensors with Fortran 90 arrays. Fortran 90 arrays don't provide a tensor multiply command. But given two Fortran 90 arrays, you can certainly think of clever ways to do the multiplication.

Blitz++ went to a lot of effort to create these array operators, which are convenient. But simply providing multi-dimensional arrays (which Blitz++ also does) is more important. I could do without the former (I'd have to write all loops out myself); but not the latter.

but how would the efficient memory-usage and memory-layout be of any help,

if its contiguous memory anyways?

Performance of HPC code on modern computers is determined primarily by memory layout. Because memory is fetched in blocks, and it takes about 100X as long as anything else happening within the CPU. Differences in compiler optimization, or library used, have a relatively small effect.

https://www.google.com/search?q=memory+layout+efficiency+tensor+operations&oq=memory+layout+efficiency+tensor+operations&aqs=chrome..69i57.5745j0j7&sourceid=chrome&ie=UTF-8

https://stackoverflow.com/questions/44774234/why-tensorflow-uses-channel-last-ordering-instead-of-row-major

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/blitzpp/blitz/issues/119#issuecomment-485900945, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJLYGG6SDIXF2K3GG3SXZ3PR5CAVANCNFSM4HHZJGUA .

citibeth commented 5 years ago

On Tue, Apr 23, 2019 at 2:57 PM Patrik Jonsson notifications@github.com wrote:

Also, it's important to note that IF you're using blitz, explicit loops over array indices are likely to be slower than using the expressions, because the expressions use stack traversals whereas array indices require a dot product every time.

...except that compilers are good at optimizing that stuff out, at least for 1D loops. I would not assume these things before trying them out.

-- Elizabeth

nt1tov commented 5 years ago

Ok, thats actually very important to know. Thank you again for theses answers, it definitely helped me alot. I will dig deeper into what you mentioned. +1

@ClmnsRck Hello! Can I contact you please?I do the same things