elkebir-group / doubletD

4 stars 4 forks source link

how to create input #1

Open baj12 opened 2 years ago

baj12 commented 2 years ago

Hi, I would like to test your tool. How can create the two input files. I have the results from a cellranger run, i.e. bam, fastq, and count matrix. Thx Bernd

lweber21 commented 2 years ago

Hi, Thanks for inquiring about our tool. I'm not too familiar with cellranger but from my understanding, it is a single-cell RNA expression tool? Are you working single-cell RNA data or single-cell DNA? Please note that our tool is designed for high coverage single-cell DNA sequencing data, although it might be possible to try it on RNA data. To create the input, you will need to identify a list of single-nucleotide variants. If you do not have a list of suspected variants, you would need to run a variant caller on the pooled single-cells. Then you can use your BAM file to perform a pileup on these positions. The two input matrices are of dimension # cells (droplets) x # of variants. In the DP matrix, each entry in the matrix is the count of total reads for cell (droplet) i at variant j and for the AD matrix, each entry is the total number of alternate (variant) reads for cell (droplet) i at variant j. For the specific file formatting, please see the example files.

baj12 commented 2 years ago

Thanks for the response. Indeed, I am working with RNAseq data. When you talk about high-coverage single-cell DNA data you mean "full" coverage of the genome or sequencing depth per position, or large groups of single cells (how many cells?)? Since I would expect only one DNA molecule per cell I assume it is the first one. I am interested in testing this on mRNA data, but I am not too familiar with the variant calling process. Do you have any pointers? Which programs to use, which parameters to look out for. As a starter, the commands you used to generate the input data would be perfect. Thanks so much for your kind support.