cougarlj / COMPSRA

COMPSRA: a COMprehensive Platform for Small RNA-Seq data Analysis
https://regepi.bwh.harvard.edu/circurna/
GNU General Public License v3.0
16 stars 6 forks source link

Inconsistent count of miRNA between bam file and count table using COMPASS #5

Open RJWANGbioinfo opened 4 years ago

RJWANGbioinfo commented 4 years ago

Hi, I'm using the COMPASS to run some testing analysis. One issue makes me confused is the inconsistent count between the bam file and the miRNA count table. The command I used is:

java -jar COMPSRA.jar -ref hg38 -ann -ac 1 --ann_bam -in ./example_out/sample01/sample01_17to50_FitRead_STAR_Aligned.out.bam -out ./example_out/sample01/sample01_17to50_FitRead_STAR_Aligned

When I compare the count of miRNA from "sample01_17to50_FitRead_STAR_Aligned_miRNA.bam" versus the one in "sample01_17to50_FitRead_STAR_Aligned_miRNA.txt", it seems some of them not matched well, for instance, there are 34 reads mapped to hsa-let-7a-2 (chr11: 122146522-122146593) in bam file, while the count in the .txt file is only 1.
Could you help to figure out how to explain this? Thank you!

cougarlj commented 4 years ago

Dear RJWANGbioinfo,

Thank you for using COMPSRA and writing response to me. The miRNA hsa-let-7a-2 is a pre-miRNA and it can form hsa-let-7a-5p and hsa-let-7a-2-3p. For pre-miRNAs, COMPSRA will only show the unique read count, which means that for all the reads mapped to the pre-miRNA region, they will remove the part of reads that can map to the mature miRNAs. In brief, you can think that N(pre-miRNA)=N(total)-N(mature miRNA).

Best Wishes, Jiang

RJWANGbioinfo commented 4 years ago

Hi @cougarlj ,

Thanks for your prompt response! I just tried to extract the unique mapped reads from the bam using:

samtools view -b -q 255 sample01_17to50_FitRead_STAR_Aligned_miRNA.bam > unique.bam

And there are 0 reads found in the miRNA hsa-let-7a-2. Could you help me to figure out how to extract the consistent unique count of this miRNA from the bam output of COMPSRA?

And if possible, could you help me to understand the setting of '-aol'? If my understanding is correct, it sets the overlap percentage between reads and miRNA, 1.0 mean 100% overlapped. While it seems COMPSRA will still generate a result even if I set this value >1.0 (e.g., 1.5), why is that?

Thank you!

cougarlj commented 4 years ago

Dear @RJWANGbioinfo ,

Sorry to reply late. The bam file from COMPSRA doesn't store annotation information, so you don't know which read belongs to a certain miRNA. COMPSRA takes STAR as the aligner and I remember that you can provide an annotation file when you align reads to the genome. If that so, maybe you can extract the target reads and count it. Moreover, I remember COMPSRA can directly annotate pre-miRNA when you use -ac 11 after -ann. Also, COMPSRA has a toolkit that contains some easy functions. One of them is to extract reads with certain annotations. You can try the example: $COMPSRA -tk -sa chr21|16539828|16539911|+|hsa-let-7c|miRBase;chr21|16539838|16539859|+|hsa-let-7c-5p|miRBase -sam /your/output/source_test.sam -fq /your/target/output/target_test.fastq .

Your understanding about is '-aol' is right and when the value is greater than 1, COMPSRA will set it as 1.

Hope this can help you.

Best Wishes, Jiang Li

RJWANGbioinfo commented 4 years ago

Hi @cougarlj Thanks for your response. I tried the example command you mentioned as: $ java -jar COMPSRA.jar -tk -sa chr21|16539828|16539911|+|hsa-let-7c|miRBase;chr21|16539838|16539859|+|hsa-let-7c-5p|miRBase -sam example_out/sample01/sample01_17to50_FitRead_STAR_Aligned.out.bam -fq example_out/sample01/sample01_17to50_FitRead.fastq.gz .

It gives me the below errors: bash: +: command not found

bash: hsa-let-7c: command not found

bash: miRBase: command not found

bash: 16539911: command not found

bash: 16539828: command not found

bash: +: command not found

bash: hsa-let-7c-5p: command not found

bash: miRBase: command not found

bash: 16539859: command not found

bash: 16539838: command not found

bash: chr21: command not found