Demultiplexing: unable to run in parallel

BijnBau commented 3 years ago

Dear scruff team,

I have been trying to implement your tool for my analysis of a 10X library. I have been using the option to parallelize the demultiplexing. However, I ran into the problem that this just does not work. When I check the cores demultiplex runs on, then it will only use 1. It seems to me that parallelization has been included in previous versions but not in the current one? Is this correct?

In addition, can you help with the amelioration of the current speed? I am planning to process 135.698.545 reads which will take almost a week and would love to reach the speeds mentioned in your publication.

Thank you for your help and response! I have appended my code below.

`library(scruff) library(parallel)

Read1 <- "10X/Thesis_data/TCells_Splice/sample/sample_L001_R1_001.fastq" Read2 <- "10X/Thesis_data/TCells_Splice/sample/sample_L001_R1_001.fastq" Fasta <- "References/refdata-gex-GRCh38-2020-A/fasta/genome.fa" Barcodes <- scan("10X/Thesis_data/TCells_Splice/sample/barcodes.tsv", what="list") Reference <- "/References/refdata-gex-GRCh38-2020-A/genes/genes.gtf" indexBase <- "GRCh38"

sample <- scruff( project=paste0("sample", Sys.Date()), experiment=c("sample"), lane=c("L001"), read1Path=c(Read1), read2Path=c(Read2), Barcodes, index=indexBase, Reference, bcStart=1, bcStop=16, bcEdit=5, umiStart=17, umiStop=26, keep=75, celPerWell="4489", nBestLocations=1, minQual=10, yieldReads=1e+06, alignmentFileFormat="BAM", demultiplexOutDir="./Demultiplex", alignmentOutDir="./Aligment", countUmiOutDir="./Count", demultiplexSummaryPrefix="Demultiplex_MS1390", alignmentSummaryPrefix="AlignmentMS1390", countPrefix="countUMI", logfilePrefix= format(Sys.time(), "%Y%m%d%H%M%S"), overwrite=FALSE, verbose=TRUE, cores=max(1,parallel::detectCores()-2), threads=1)

saveRDS(sample, "10X/Thesis_data/TCells_Splice/sample/sample.rds")`

zhewa commented 3 years ago

Hi @BijnBau,

Thank you for using scruff. No, parallelization is enabled in current release versions on Bioconductor. We have not made any changes to parallelization. In order to help with this issue, can you provide the following information?

Can you paste the output of sessionInfo() after loading library(scruff)?
Can you check the number of cores available on your operating system? You can do so by running parallel::detectCores() in R.
Can you provide any output messages and errors associated with the scruff function call you provided above? This will greatly help with debugging.

In addition, there might be some issues with the command you provided.

The Read1 and Read2 parameters you provided above are the same. This might be a typo. In order to successfully run scruff, read1Path should be the path to the read file with cell barcode and UMI sequence information and read2Path should be the path to the read file containing the transcript sequence.
The setting bcEdit=5 is kind of high. It significantly increases computation complexity. Personally I would not set such a high threshold for allowing cell barcode mismatch correction. Reducing this number to maybe 1 might reduce the time needed for this part of computation.

I think I should mention that scruff is for preprocessing scRNA-seq reads generated from plate-based FACS-sorted protocols (such as CEL-Seq) with predefined list of cell barcodes. This means we know beforehand the reads from certain cell barcodes are true cell-associated reads and the reads from certain cell barcodes (if any) should not contain reads from cells. This is different from the protocol from 10X Genomics where the cell barcodes associated with cell containing droplets are inferred by a cell calling algorithm.

Thank you again for using scruff. Hope this is helpful to you.

BijnBau commented 3 years ago

Dear @zhewa

I have used our suggestions to adapt the code. As I have no definite errors to work around despite am working to extend scruff to our dataset, I will close this thread.

Thank you very much for your kind suggestions!

campbio / scruff

Demultiplexing: unable to run in parallel #179