AllenInstitute / transcriptomic_clustering

improved clustering pipeline for transcriptomic data
Other
4 stars 1 forks source link

chunked normalization prototype wip #40

Closed sgratiy closed 3 years ago

sgratiy commented 3 years ago

Prototype performance on synthetic dataset 20k cells by 30k genes:

in memory: syn_log_cpm_in_memory builtin saving to backing file (terrible time and memory performance): syn_log_cpm_chunked_backed_builtin

my implementation (fast and tight): syn_log_cpm_chunked_backed_mine

Testing on small dataset(300k) fully in memory: small_log_cpm_in_memory

in backed mode saving chunks to a different file (using just 10% of memory):

small_log_cpm_chunked10k