flatironinstitute / sparse_dot

Python wrapper for Intel Math Kernel Library (MKL) matrix multiplication
MIT License
73 stars 10 forks source link

dot_product_mkl: ValueError: Input matrices to dot_product_mkl must be CSR, CSC, or BSR; COO is not supported #23

Closed yCobanoglu closed 11 months ago

yCobanoglu commented 12 months ago

Hello, thanks for the great library, first of all. I get this error just on certain machines. The same code words locally and on an Machine 1) and 2) but not on Machine 3). This has to be related to the Architecture or Intel MKL not sure what to make of it.

Traceback (most recent call last):
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 463, in _process_worker
    r = call_item()
        ^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/externals/loky/process_executor.py", line 291, in __call__
    return self.fn(*self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/parallel.py", line 589, in __call__
    return [func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/parallel.py", line 589, in <listcomp>
    return [func(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/gat/utils.py", line 20, in f
    return dot_product_mkl(cov, cov_vec).toarray()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/sparse_dot_mkl/sparse_dot.py", line 58, in dot_product_mkl
    return _sds(matrix_a, matrix_b, cast=cast, reorder_output=reorder_output, dense=dense)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/sparse_dot_mkl/_sparse_sparse.py", line 114, in _sparse_dot_sparse
    raise ValueError("Input matrices to dot_product_mkl must be CSR, CSC, or BSR; COO is not supported")
ValueError: Input matrices to dot_product_mkl must be CSR, CSC, or BSR; COO is not supported
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/run.py", line 263, in <module>
    gat_gp, gat_ntk_, time1 = with_time(lambda: gat_ntk(adj, x, layer, SIGMA_B, SIGMA_W, NONLINEAR))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/run.py", line 161, in with_time
    res = f()
          ^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/run.py", line 263, in <lambda>
    gat_gp, gat_ntk_, time1 = with_time(lambda: gat_ntk(adj, x, layer, SIGMA_B, SIGMA_W, NONLINEAR))
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/gat/gat_ntk.py", line 32, in gat_ntk
    return forward(kernel_init, kernel_init, 1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/gat/gat_ntk.py", line 24, in forward
    nngp = batch_mul(sigma, cov_att)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/gat/utils.py", line 50, in batch_mul
    results = Parallel(n_jobs=-1)(delayed(f)(cov[n * window : (n + 1) * window]) for n in tqdm(range(window)))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/parallel.py", line 1952, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/parallel.py", line 1595, in _get_outputs
    yield from self._retrieve()
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/parallel.py", line 1699, in _retrieve
    self._raise_error_fast()
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/parallel.py", line 1734, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/parallel.py", line 736, in get_result
    return self._return_or_raise()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv1/lib/python3.11/site-packages/joblib/parallel.py", line 754, in _return_or_raise
    raise self._result
ValueError: Input matrices to dot_product_mkl must be CSR, CSC, or BSR; COO is not supported

The error has nothing to do with the fact that the matrices are not sparse. gram_matrix_mkl works fine.

asistradition commented 12 months ago

This is an error that's thrown when scipy.sparse.issparse() returns True on your input, but is_csr, is_csc, and is_bsr all return False. It's thrown before anything is passed into MKL structs, so it's unlikely to be an issue with MKL.

This is a pretty simple logical check and I don't see how it could be failing - I would need a reproducible example to check.

yCobanoglu commented 12 months ago

ok i can check but if i pass dot_product_mkl(scipy.sparse.csr_array(a), scipy.spare.csr_array(b)) this error should not happen right ? I will give an error log tomorrow with that setup. I will be hard to reproduce but i had the issues with the Compute Optimized VMs from Google Cloud.

yCobanoglu commented 11 months ago

@asistradition Here the code throws an error although i check first if the matrices are sparse and then even convert them to sparse right before passing them to the function ( you can see that from the error i convert both matrix to sparse right before passing the to the sparse dot product): return dot_product_mkl(scipy.sparse.csr_array(cov), scipy.sparse.csr_array(cov_vec)).toarray()

The code snipped that throws the error:

def batch_elem_mul_mkl(cov):
    cov = scipy.sparse.csr_array(cov)

    def f(cov_adj):
        cov_vec = cov_adj.reshape((cov.shape[1], -1), order="F")
        cov_vec = scipy.sparse.csr_array(cov_vec)
        # does sparse dot product parallel but sometimes there is a bug on certain machines
        # ValueError: Input matrices to dot_product_mkl must be CSR, CSC, or BSR; COO is not supported although all arrays are csr
        if not scipy.sparse.issparse(cov) or not scipy.sparse.issparse(cov_vec):
            raise ValueError("My Error: Input matrices to dot_product_mkl must sparse")
        return dot_product_mkl(scipy.sparse.csr_array(cov), scipy.sparse.csr_array(cov_vec)).toarray()

    return f

The error

Traceback (most recent call last):
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/run.py", line 263, in <module>
    gat_gp, gat_ntk_, time1 = with_time(lambda: gat_ntk(adj, x, layer, SIGMA_B, SIGMA_W, NONLINEAR))
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/run.py", line 161, in with_time
    res = f()
          ^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/run.py", line 263, in <lambda>
    gat_gp, gat_ntk_, time1 = with_time(lambda: gat_ntk(adj, x, layer, SIGMA_B, SIGMA_W, NONLINEAR))
                                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/gat/gat_ntk.py", line 32, in gat_ntk
    return forward(kernel_init, kernel_init, 1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/gat/gat_ntk.py", line 24, in forward
    nngp = batch_mul(sigma, cov_att)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/gat/utils.py", line 58, in batch_mul
    results = np.vstack(list(results))
                        ^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/gnn/infinite_width/gat/utils.py", line 22, in f
    return dot_product_mkl(scipy.sparse.csr_array(cov), scipy.sparse.csr_array(cov_vec)).toarray()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv/lib/python3.11/site-packages/sparse_dot_mkl/sparse_dot.py", line 58, in dot_product_mkl
    return _sds(matrix_a, matrix_b, cast=cast, reorder_output=reorder_output, dense=dense)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yunus.ebabylon/graph-neural-networks/.venv/lib/python3.11/site-packages/sparse_dot_mkl/_sparse_sparse.py", line 114, in _sparse_dot_sparse
    raise ValueError("Input matrices to dot_product_mkl must be CSR, CSC, or BSR; COO is not supported")
ValueError: Input matrices to dot_product_mkl must be CSR, CSC, or BSR; COO is not supported

And as i mentioned i can only reproduce the errors on machines on the cloud not locally.

asistradition commented 11 months ago

Looks like scipy is refactoring *_matrix into *_array objects and that is probably the reason for the error. If you can use csr_matrix for now that should work and I will find some time to make the _array objects work soon.

It's probably also very dependent on the scipy version, someone is developing that module now.

yCobanoglu commented 11 months ago

Yeah you were right using scipy.csr_matrix fixed it. Thanks !

asistradition commented 11 months ago

v0.9.0 should support *_array in place of *_matrix