cyclops-community / ctf

Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays
Other
199 stars 53 forks source link

contracting tensors with many dimensions yields "Failed to map tensors to physical grid" #104

Closed jcmgray closed 4 years ago

jcmgray commented 4 years ago

contracting

If I try and contract tensor with many dimensions (I'm using the python interface and ctf compiled with openblas):

import ctf

x = ctf.random.random([2] * 28)
y = ctf.random.random([2] * 28)  # ok if 27
z = ctf.tensordot(x, y, min(x.ndim, y.ndim))

I get the the following error:

ERROR: Failed to map contraction!       
Failed to map tensors to physical grid  
CTF ERROR: Failed to perform contraction

converting to numpy

On a similar note, the following gives me a crash:

x = ctf.random.random([2] * 27)  # ok for 26
xn = x.to_nparray()

with traceback:

terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length                                      

which I suspect is to do with internally representing dimensions as the alphabet, although why the contraction only breaks after 27/28 dimensions suggests there might be some more nuance!

solomonik commented 4 years ago

The Failed to perform contraction error typically means that CTF did not think there was sufficient memory to execute the contractions. I find that the first example fails on my laptop, for which just storing the tensors should use about 50% of memory, but runs on a node of Stampede2 which has more memory. It seems CTF creates / needs to create a copy of the tensors, which is suboptimal in this case, but may be needed more generally. 2-3X overhead in memory for a contraction being executed relative to the tensor size is about normal for CTF (if things don't quite fit, one can often use more nodes in distributed memory).

I think the problem with the to_nparray is also memory, but testing it I noticed the performance was quite poor and pushed an optimized variant for the case of one sequential execution and a dense tensor. But this function will still be slow and have substantial memory overhead when executed in the parallel setting. So ideally conversions of big arrays ought not to happen in an application.

jcmgray commented 4 years ago

Thanks for the response! Ah that is good to know that it's just a to do with expected memory usage. And yes regarding to_nparray at least for my use cases should only need to convert very small tensors, my hunch was just that it might show the same/related error to tensordot etc.

Do you expect any other issues with using such high dimensional tensors - e.g. performance ones that might favour trying to reshape / fuse dimensions before contraction? (no worries if you have no thoughts on this issue - closing for now).

solomonik commented 4 years ago

High-order tensors and contractions thereof should be handled efficiently, but actually the reshape operation is quite costly because it requires changes to the distribution of data in the parallel case. We are currently developing something more efficient for the special case of matricization.