Closed 1010shane closed 1 year ago
Hi,
one suggestion is to select the top n peaks based on the coverage. For example, ArchR allows you to get the top 20K peaks, and then this matrix can be used as input for downstream analysis.
I am not sure what your tasks are, but for visualization or clustering, using only top peaks does not greatly change the results, in my experience.
Best, Zhijian
Hello,
Great software! I am running into a bit of trouble working with the resulting imputed matrix. Specifically, my output matrix is ~100 Gb in size, which is somewhat expected, however loading this output into R and manipulating it can get quite challenging, even in an HPC environment. Any potential solutions to this? For reference, my input matrix features (regions) are 500 bp chunks of the entire hg38 genome, as generated by ArchR.
Thanks!