epigen / scrnaseq_processing_seurat

A Snakemake workflow and MrBiomics module for processing and visualizing (multimodal) sc/snRNA-seq data generated with 10X Genomics Kits or in the MTX matrix file format powered by the R package Seurat.
https://epigen.github.io/scrnaseq_processing_seurat/
MIT License
15 stars 1 forks source link

speed up rule save_counts (bottleneck) #15

Closed sreichl closed 9 months ago

sreichl commented 11 months ago

fwrite

fread

general

https://rdrr.io/cran/data.table/man/fwrite.html

library(data.table)
fwrite(as.data.frame(GetAssayData(object = seurat_object, slot = "scale.data", assay = "SCT")), file = file.path(result_dir, paste0(step, 'scaled_', 'RNA', '.csv')), row.names=TRUE)

# more general
#fast writing
fwrite(as.data.frame(df), file=file.path("path/to/file.csv"), row.names=TRUE)

#fast reading
df <- data.frame(fread(file.path("path/to/file.csv"), header=TRUE), row.names=1)
sreichl commented 11 months ago

not the same! differences

new comparison diff --brief <(tail -n +2 NORMALIZED_RNA.csv | cut -d, -f2-) <(tail -n +2 NORMALIZED_RNA_original.csv | cut -d, -f2-) Files /dev/fd/63 and /dev/fd/62 differ

compare with brief diff diff --brief NORMALIZED_RNA_original.csv NORMALIZED_RNA.csv

head -n 1 NORMALIZED_RNA.csv | cut -c 1-100 "",EMICROP_A_AAACCTGAGAATAGGG-1,EMICROP_A_AAACCTGAGACAATAC-1,EMICROP_A_AAACCTGAGACGACGT-1,EMICROPA

head -n 1 NORMALIZED_RNA_original.csv | cut -c 1-100 "","EMICROP_A_AAACCTGAGAATAGGG.1","EMICROP_A_AAACCTGAGACAATAC.1","EMICROP_A_AAACCTGAGACGACGT.1","EMI

tail -n 1 NORMALIZED_RNA.csv | cut -c 1-100 "28441",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

tail -n 1 NORMALIZED_RNA_original.csv | cut -c 1-100 "AC007325.4",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0