cyclops-community / ctf

Cyclops Tensor Framework: parallel arithmetic on multidimensional arrays
Other
194 stars 53 forks source link

Fourth order symmetric sparse tensor contraction issues #117

Closed shrshi closed 3 years ago

shrshi commented 3 years ago

Hi, I am unable to create a random fourth order sparse symmetric tensor. The following code snippet fails for a single MPI process:

int ndims[] = {400, 400, 400, 400};
int ns[] = {SY, SY, SY, SY};
char const name[] = "A"; 
CTF_Tensor result(4, true, ndims, ns, dw, Ring<double>(), name, 1);
result.fill_sp_random(0.0, 0.1, 0.25); //Segmentation fault

I've also tried explicitly creating a list of non-zeros with index tuples and values. The following fails as well:

result.write(values.size(), inds.data(), values.data()); //Segmentation fault

Here, inds is the global index formed from the index tuple at every non-zero. Any help is greatly appreciated! Thanks!

solomonik commented 3 years ago

For a fully symmetric tensor, the CTF syntax for symmetry is int ns[] = {SY, SY, SY, NS};, each SY/NS/AS/SH string denotes the symmetry of this index to the subsequent one. So the last member of ns[] should always be NS.

shrshi commented 3 years ago

Thank you for your quick response earlier! I have some follow-up clarifications -

  1. Is it possible to perform multiple contractions for symmetric tensors in a single step? For example, for the fourth-order tensor Y Y[iY] = X[iX] * U1[iU1] * U2[iU2] * U3[iU3]
  2. For a sparse symmetric input tensor, is it possible to perform a tensor-matrix contraction such that the output tensor is also sparse? For example Y[iY] = X[iX] * U[iU] with sparse semi-symmetric Y and sparse symmetric X
  3. For order-N sparse symmetric tensor, N-1 tensor contractions were performed with rank-2 dense matrices with 8 MPI processes on a machine with each node consisting of two 2.7GHz 12-core Intel Xeon Gold 6226 processors and 19MB L3 cache size. The runtimes obtained for the following sparse symmetric tensors -
Order Length of each mode Number of non-zeros Runtime
6 50 1000 3.344377 s
6 100 10000 88.492327 s
7 100 100000 >1hr

Do these runtimes seem reasonable? Thank you for your time!

solomonik commented 3 years ago

Will try to answer as best I can.

  1. We have a routine for sparse all-at-once MTTKRP and TTTP, but haven't adapted these to handle sparsity, otherwise a sequence of contractions will be done in a pairwise manner. That will also work with symmetric tensors.
  2. I believe this should work with either sparse or dense U by defining Y to be partially symmetric, e.g., {SY, NS, NS}. I guess that is what you mean by semi-symmetric.
  3. I can't really confer what are good running times, but I expect there could be significant overheads for some contractions due to our use of transposition/addition to handle symmetries and the relatively high cost of these in the sparse case. While for others we might perform pretty well. If you are interested, you can get more info on what internal functions and MPI/MKL calls CTF spends its time in, by building CTF with -DPMPI -DPROFILE -DAUTO_PROFILE which should result in logs at end of execution.