BIMSBbioinfo / ikarus

Identifying tumor cells at the single-cell level using machine learning
MIT License
45 stars 12 forks source link

Sparse dot product #21

Closed epaaso closed 3 weeks ago

epaaso commented 10 months ago

When using big datasets, calculation of the dot product with numpy requires assigning a lot of virtual memory to the dense connectivity matrix. Most computers won't be able to allocate that much RAM.

Luckily, scipy has a sparse matrix dot operation that is both much faster and requires much less RAM. It is a method the sparse matrix object.

Also, we don't have to check if the matrix is sparse because numpy has an inner method for dense matrices that is called in the same way.

dohmjan commented 9 months ago

Hi! Thank you for the hint and your PR. I left a comment regarding the axis in the summation. Apart from that it looks good to me!