JonathanShor / DoubletDetection

Doublet detection in single-cell RNA-seq data.
https://doubletdetection.readthedocs.io/en/stable/
MIT License
85 stars 23 forks source link

Memory Error when running. #141

Closed Mike0117 closed 3 years ago

Mike0117 commented 3 years ago

Hi,

I'm Michael a graduate student at UNL and I am learning how to use your doublet detection package to clean up my RNAseq data before running it through seurat and monocle3. I have a test matrix I'm trying to use when following along with the jupyter notebook example.

The code is as follows:

import sys import numpy as np import scanpy as sc import doubletdetection import numba import tarfile import matplotlib.pyplot as plt ##################################################

Test1_matrix_path = 'E:\Cellranger_outs\test_runs\test\outs\filtered_feature_bc_matrix\matrix.mtx.gz' Test1_raw_counts = sc.read(Test1_matrix_path) Test1_raw_counts.var_names_make_unique() sc.pp.filter_genes(Test1_raw_counts, min_cells=1) Test1_zero_genes = (np.sum(Test1_raw_counts, axis=0) == 0) Test1_raw_counts = Test1_raw_counts[:, ~Test1_zero_genes] Test1_clf_old = doubletdetection.BoostClassifier(n_iters=25, use_phenograph=True, standard_scaling=False, verbose=True) Test1_old_doublets = Test1_clf_old.fit(Test1_raw_counts).predict(p_thresh=1e-7, voter_thresh=0.8)

Test1_db_old = Test1_old_doublets.nonzero()

f = open("Test1_ddold_prepathms.txt", "w") for d in Test1_db_old[0] : f.write(str(d+1) + "\n") f.close()

My system has a 16 core processor with 256gb ram however, after about an hour of running I always end up maxing out my memory. Any help you can provide into what I'm doing wrong would greatly help.

Thank you.