chanwkimlab / MarcoPolo

MarcoPolo is a clustering-free approach to the exploration of bimodally expressed genes along with group information in single-cell RNA-seq data
https://chanwkimlab.github.io/MarcoPolo/HumanLiver/index.html
Other
19 stars 2 forks source link

Run MarcoPolo in local machine with Jupyter Notebook #5

Open SHADJIA opened 2 years ago

SHADJIA commented 2 years ago

Hello @chanwkimlab ,

I'm a beginner for Python world. I would like to test your tool for my dataset and from my local machine with Jupyter Notebook. Can you help me with your code? Actually when I try your vignette, I get this error :

AssertionError                            Traceback (most recent call last)
C:\Users\AppData\Local\Temp/ipykernel_2416/4231211302.py in <module>
      5     adata.obs["size_factor"] = norm_factor/norm_factor.mean()
      6     print("size factor was calculated")
----> 7 regression_result = MarcoPolo.run_regression(adata=adata, size_factor_key="size_factor",
      8                          num_threads=8)
      9 # If you use a local machine, you can set `num_threads` to higher than 1 (maybe upto 4), which will speed up the regression a lot. For some reason, num_threads>1 does not seem to work well on colab (maybe due to the the limited RAM).
.
.
.
AssertionError: Torch not compiled with CUDA enabled

Do you have any idea to resolve this issue?

Thanks a lot in advance.

Regards, Sha

chanwkimlab commented 2 years ago

Hi @SHADJIA,

Thank you for using our software. The error occurred because the run_regression function uses GPU by default but the installed PyTorch on your local machine is the one that does not support GPU; therefore, the solutions are as below.

  1. If you have a GPU on your local machine and intend to use it to accelerate the MarcoPolo algorithm, you should install a proper PyTorch version with CUDA support. I believe the PyTorch currently installed on your machine only supports CPU. You may find many instructions on installing CUDA and CUDA-enabled PyTorch online such as this link: https://www.youtube.com/watch?v=GMSjDTU8Zlc.
  2. If you don't have a GPU, you can simply resolve the issue by changing the device parameter of the run_regression function from "cuda:0" to "cpu". However, the execution of MarcoPolo would be much slower.

Please let me know if you have any other questions.

Best, Chanwoo

SHADJIA commented 2 years ago

Hello @chanwkimlab ,

Thanks for your reply.

I don't have GPU in my computer. I execute the code by using CPU as device. It's still working since 2 hours. LIke in colab, I can't see the progression bar with jupyter. And since the code is running for regression, "size factor was calculated" is not printed. I just have : The numbers of clusters to test: [1, 2] Y: (22748, 16656) X: (22748, 1) s: (22748,) Is this normal or there is a problem in the execution?

Thanks once again. Regards, Sha

chanwkimlab commented 2 years ago

Hi @SHADJIA,

As you use CPU instead of GPU and your input data is very large, it is very normal that the regression takes longer than 2 hours. Also, It's possible that you don't see "size factor was calculated" if your input data already contains "size_factor" column. However, it is interesting that you don't see a progress bar. When I changed the device parameter from "cuda:0" to "cpu" in the colab environment, I was still able to see the progress bar. For debugging purposes, you can manually edit the regression/trainer.py file to add the following lines to the fit_multiple_genes function. You can retrieve the path where MarcoPolo was installed by running MarcoPolo.__file__ after executing import MarcoPolo.

for iter_idx, exp_data_idx in enumerate(pbar):
+    if iter_idx%10==0:
+    print(iter_idx)
cell_dataset = CellDataset(Y_select[:, iter_idx:iter_idx + 1], X, s)

Best, Chanwoo