CostaLab / scopen

scOpen: single-cell open chromatin analysis via NMF modelling
GNU General Public License v3.0
25 stars 4 forks source link

Working with large matrixes #19

Closed 1010shane closed 1 year ago

1010shane commented 2 years ago

Hello,

Great software! I am running into a bit of trouble working with the resulting imputed matrix. Specifically, my output matrix is ~100 Gb in size, which is somewhat expected, however loading this output into R and manipulating it can get quite challenging, even in an HPC environment. Any potential solutions to this? For reference, my input matrix features (regions) are 500 bp chunks of the entire hg38 genome, as generated by ArchR.

Thanks!

lzj1769 commented 1 year ago

Hi,

one suggestion is to select the top n peaks based on the coverage. For example, ArchR allows you to get the top 20K peaks, and then this matrix can be used as input for downstream analysis.

I am not sure what your tasks are, but for visualization or clustering, using only top peaks does not greatly change the results, in my experience.

Best, Zhijian