Closed jcmgray closed 4 years ago
The Failed to perform contraction
error typically means that CTF did not think there was sufficient memory to execute the contractions. I find that the first example fails on my laptop, for which just storing the tensors should use about 50% of memory, but runs on a node of Stampede2 which has more memory. It seems CTF creates / needs to create a copy of the tensors, which is suboptimal in this case, but may be needed more generally. 2-3X overhead in memory for a contraction being executed relative to the tensor size is about normal for CTF (if things don't quite fit, one can often use more nodes in distributed memory).
I think the problem with the to_nparray
is also memory, but testing it I noticed the performance was quite poor and pushed an optimized variant for the case of one sequential execution and a dense tensor. But this function will still be slow and have substantial memory overhead when executed in the parallel setting. So ideally conversions of big arrays ought not to happen in an application.
Thanks for the response! Ah that is good to know that it's just a to do with expected memory usage. And yes regarding to_nparray
at least for my use cases should only need to convert very small tensors, my hunch was just that it might show the same/related error to tensordot
etc.
Do you expect any other issues with using such high dimensional tensors - e.g. performance ones that might favour trying to reshape / fuse dimensions before contraction? (no worries if you have no thoughts on this issue - closing for now).
High-order tensors and contractions thereof should be handled efficiently, but actually the reshape operation is quite costly because it requires changes to the distribution of data in the parallel case. We are currently developing something more efficient for the special case of matricization.
contracting
If I try and contract tensor with many dimensions (I'm using the python interface and
ctf
compiled with openblas):I get the the following error:
converting to
numpy
On a similar note, the following gives me a crash:
with traceback:
which I suspect is to do with internally representing dimensions as the alphabet, although why the contraction only breaks after 27/28 dimensions suggests there might be some more nuance!