campbio / scruff

Single Cell RNA-seq UMI Filtering Facilitator
http://bioconductor.org/packages/scruff/
Other
8 stars 9 forks source link

Demultiplexing: unable to run in parallel #179

Closed BijnBau closed 3 years ago

BijnBau commented 3 years ago

Dear scruff team,

I have been trying to implement your tool for my analysis of a 10X library. I have been using the option to parallelize the demultiplexing. However, I ran into the problem that this just does not work. When I check the cores demultiplex runs on, then it will only use 1. It seems to me that parallelization has been included in previous versions but not in the current one? Is this correct?

In addition, can you help with the amelioration of the current speed? I am planning to process 135.698.545 reads which will take almost a week and would love to reach the speeds mentioned in your publication.

Thank you for your help and response! I have appended my code below.

`library(scruff) library(parallel)

Read1 <- "10X/Thesis_data/TCells_Splice/sample/sample_L001_R1_001.fastq" Read2 <- "10X/Thesis_data/TCells_Splice/sample/sample_L001_R1_001.fastq" Fasta <- "References/refdata-gex-GRCh38-2020-A/fasta/genome.fa" Barcodes <- scan("10X/Thesis_data/TCells_Splice/sample/barcodes.tsv", what="list") Reference <- "/References/refdata-gex-GRCh38-2020-A/genes/genes.gtf" indexBase <- "GRCh38"

sample <- scruff( project=paste0("sample", Sys.Date()), experiment=c("sample"), lane=c("L001"), read1Path=c(Read1), read2Path=c(Read2), Barcodes, index=indexBase, Reference, bcStart=1, bcStop=16, bcEdit=5, umiStart=17, umiStop=26, keep=75, celPerWell="4489", nBestLocations=1, minQual=10, yieldReads=1e+06, alignmentFileFormat="BAM", demultiplexOutDir="./Demultiplex", alignmentOutDir="./Aligment", countUmiOutDir="./Count", demultiplexSummaryPrefix="Demultiplex_MS1390", alignmentSummaryPrefix="AlignmentMS1390", countPrefix="countUMI", logfilePrefix= format(Sys.time(), "%Y%m%d%H%M%S"), overwrite=FALSE, verbose=TRUE, cores=max(1,parallel::detectCores()-2), threads=1)

saveRDS(sample, "10X/Thesis_data/TCells_Splice/sample/sample.rds")`

zhewa commented 3 years ago

Hi @BijnBau,

Thank you for using scruff. No, parallelization is enabled in current release versions on Bioconductor. We have not made any changes to parallelization. In order to help with this issue, can you provide the following information?

In addition, there might be some issues with the command you provided.

I think I should mention that scruff is for preprocessing scRNA-seq reads generated from plate-based FACS-sorted protocols (such as CEL-Seq) with predefined list of cell barcodes. This means we know beforehand the reads from certain cell barcodes are true cell-associated reads and the reads from certain cell barcodes (if any) should not contain reads from cells. This is different from the protocol from 10X Genomics where the cell barcodes associated with cell containing droplets are inferred by a cell calling algorithm.

Thank you again for using scruff. Hope this is helpful to you.

BijnBau commented 3 years ago

Dear @zhewa

I have used our suggestions to adapt the code. As I have no definite errors to work around despite am working to extend scruff to our dataset, I will close this thread.

Thank you very much for your kind suggestions!