caokai1073 / uniPort

a unified single-cell data integration framework by optimal transport
MIT License
30 stars 3 forks source link

Outputting the OT matrix failed #4

Open cindyway opened 1 year ago

cindyway commented 1 year ago

adata, OT = up.Run(adatas=[spot,rna], adata_cm=adata_cm, save_OT=True)

(1) When using the CPU: The code produced an error with the following message: IndexError: index 37385 is out of bounds for dimension 0 with size 1

This error indicates that the index 37385 is invalid for the array being accessed. Further investigation is needed to determine the cause of the error and to fix it.

(2) When using the GPU from Colab, Colab Pro, Linux server and RTX3090 on windows, all reported the following error: RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())

This error suggests a problem with the CUDA implementation, possibly due to a mismatch between the version of CUDA being used and the hardware or software environment. It may be necessary to consult the author of the code for assistance in resolving this issue. (3) The error was initially thought to be related to memory usage, but changing the batch size to 100 or even 10 did not solve the issue. Therefore, the problem may not be related to memory limitations.

Is specific cuda version needed?

caokai1073 commented 1 year ago

Hi, thanks for pointing out this problem. I've looked into it, and it seems that the issue occurs when there is a significant difference between the two modal quantities. In order to address this, I recommend trying a larger batch size, such as 512 or 1024. I am working on a fix for this bug, and I plan to release an updated version of the software in the near future. Thank you again for your feedback and please let me know if you have any further questions or concerns.

zk-P commented 1 year ago

I've modified the code in vae.py at line 435 from tran_batch[j] = torch.from_numpy(tran[j]).to(device)[idx_query[j]][idx_ref] to tran_batch[j] = torch.from_numpy(tran[j]).to(device)[idx_query[j]][:,idx_ref] and it works.