Read count cut-off for circRNA differential expression

bioinfo-biols / CIRIquant

circular RNA quantification tools

https://sourceforge.net/projects/ciri/files/CIRIquant

MIT License

27 stars 17 forks source link

Read count cut-off for circRNA differential expression #21

Closed prisca399 closed 3 years ago

prisca399 commented 3 years ago

Hi @Kevinzjy,

Thanks for creating this awesome tool. I have been able to implement the CIRIquant command successfully using circRNA predictions results from other tools. I am now interested in differential expression. Would you advise filtering the circRNAs first (i.e. between the quantification and differential expression steps) to get rid of those with low read counts as quantified using CIRIquant? If not, can you explain why as well? Thanks!

Prisca

Kevinzjy commented 3 years ago

Hi @prisca399 , normally, we will filter out circRNAs with only 1 supporting read before differential expression analysis. There's no particular cut-off in this step, if you want to focus on highly expressed circRNAs, I would suggest you using a threshold of 2/5/10 BSJ reads for at least 1 sample, or tuning your own threshold according to the number of circRNAs left after filtering (e.g. top 500/1000).

prisca399 commented 3 years ago

Hi @Kevinzjy,

I am interested in filtering the main gtf output such that only circRNAs with a read count >5 will be used for differential expression. I want to ask whether it is also necessary to alter the metadata at the top of the gtf files, which lists the number of mapped reads, bsj reads, etc. If so, can you clarify which aspects should be changed and in what way?

Kevinzjy commented 3 years ago

It depends.

If you have biological replicates, then you can use prep_CIRIquant command to generate the expression matrix of circRNAs, and filter out these circRNAs with less than 5 supporting reads before running CIRI_DE_replicate. No need to alter the metadata of the GTF output.
If you are comparing one sample to one sample using CIRI_DE. Then I don't think it's a good idea to filter these circRNAs before differential analysis. I would recommend you running CIRI_DE with all circRNAs generated from the last step, and filter the DE results instead. (The calculation of DE_score and DS_score largely relies on the number of mapped reads and BSJ reads, and changes in these numbers would alter the differential expression results in some unexpected ways.)

prisca399 commented 3 years ago

Thank you for the clarification. I have multiple biological replicates consisting of tumor and normal samples like the ones you demonstrated in your paper. To confirm--you are suggesting that I perform the filter on the circRNA_bsj.csv file and not on the sample.gtf file? And I would filter out circRNAs that have less than a sum of five supporting reads across all samples? I have 24 samples total that I am comparing, 15 tumor and 9 control.

prisca399 commented 3 years ago

I was able to easily subset the bsj.csv (after running prep_CIRIquant) as you mentioned and as I interpreted above. I found no need to edit the metadata of the gtf file. Thanks!

Kevinzjy commented 3 years ago

Thanks for letting me know.