Closed cnk113 closed 3 years ago
Hi @cnk113, can you share the bash shell script that you used to run the program? Also did you create the ATACDoublet_output
output directory beforehand?
Best, Alper
So I was using the bash script supplied in the repo, but when I run overlaps and then doublets manually it works.
Now there's a different issue. I have 20000 cell loaded run, but I only end up detect 1300 doublets ~6%. All the inputs are straight from the cellranger ouput (bam, singlecell.csv) This figure is too low for the loading, any ideas on what's going wrong?
The shell script is unaware of where it is located on the file path and requires full paths to execute properly. I haven't found a good solution for this unfortunately. I'll see what I can do to in the next release so that relative paths are handled appropriately.
Low number of doublets detected could be a result of having few valid read pairs per nucleus/cell. We estimate between 20k-25k valid read pairs per nucleus/cell will achieve high recall for detecting doublets. Taking the average of the second column of OverlapSummary.txt will give you the average valid read pairs. You could try increasing the FDR threshold as well by either manually selecting cell_ids/barcodes in the DoubletProbabilities.txt output or using the --q option of the python script and rerunning. The current default is 0.01 q-value (FDR adjusted p-value).
Ah so I'm going to run different FDR values, but my cells were sequenced deeply almost 100,000 reads (haven't checked read pairs) per cell. Would the increased sequencing depth affect underlying algorithms assumptions?
In our analyses, increased read depth per nucleus should help to improve the sensitivity/recall of the method. Basically, the more reads for a single nucleus, the greater the chance of detecting that more than 2 chromosomes are associated with one barcode.
Hello,
I downloaded directly from the binary releases and it seems throw this error.
I have Java 8, and all my inputs are directly from CellRanger outputs. Any ideas?