Hoohm / dropSeqPipe

A SingleCell RNASeq pre-processing snakemake workflow
Creative Commons Attribution Share Alike 4.0 International
147 stars 47 forks source link

The merge_single_counts.R step takes 60GB memory each and runs now for >30h. #52

Closed grst closed 5 years ago

grst commented 5 years ago

Ist that normal behaviour?

Hoohm commented 5 years ago

Hey! I've noticed this happening sometimes. A major problem is the merge function in R. It's basically a full outer join but badly coded. Works well for low dimensions, bad with high ones and high number of samples.

One major update that is needed for dropSeqPipe is to convert all full matrices to parse ones and write to disk as mtx format.

This will decrease space usage and there are probably a few cool functions to merge samples together.

Hoohm commented 5 years ago

Fixed on the develop branch. Will be integrated to the next release