YiPeng-Gao / scDaPars

Dynamic Analysis of Alternative Polyadenylation from single-cell RNA-seq (scDaPars)
10 stars 1 forks source link

PDUI of only less than 1000 genes were successfully estimated #18

Open ysbioinfo opened 2 years ago

ysbioinfo commented 2 years ago

Hi, First thanks for developing such an awesome tool. I applied it on my 10X data but found only ~1000 genes were included in the final output file. I think it's strange because there are much more genes. in the figures from your Genome Research paper,

The quality of my data is quite good (median: ~ 3000 UMIs and > 1500 genes per cell). All my cells are generated from 10X 3UTR library. I use cellranger to map reads to GRCh38, and then split the bam by sinto (python package). Then I used umi_tools for deduplication. The file size of bam for each cell ranges from 5MB to 20 MB. Then I used bedtools genomecov to get the bedgraph file and fed them to dapars2. When I run dapars2, I set the Coverage_threshold to zero in include as much as genes.

I wonder that, from your experience, is my result strange? If so, which step is possibly wrong? Could you please give me some kind advice?

Best Yang

ysbioinfo commented 2 years ago

Here's some intermediate file, might be helpful for you. This is a bedgraph for one cell TTTCATGAGAAGTCAT-1.bedgraph.txt TTTCATGAGAAGTCAT-1.readcount.txt These are output file for one sample: config.txt dapars2_result.txt readdepth.txt scDaPars_imputed_results.txt This is the reference UTR file I generated for GRch38: hg38.extracted.3UTR.bed.txt