bioinfo-biols / CIRIquant

circular RNA quantification tools
https://sourceforge.net/projects/ciri/files/CIRIquant
MIT License
27 stars 17 forks source link

Limitation on Number of Samples Used/DE Filtering Issue? #51

Open mlvanhorn opened 1 year ago

mlvanhorn commented 1 year ago

Hello,

I have a question regarding whether or not there is a limit to the number of samples used with the CIRI_DE_replicate command in the differential expression portion of the CIRIquant pipeline. I'm currently trying to analyze a data set of about 10500 samples (including biological replicates) and have noticed some strange outputs in my circRNA_de.tsv output file. There were only six DE circRNA, which I initially assumed to be because of the larger sample size. However, when scrolling through the .tsv, I saw that over half (646,400 or 61%) of the circRNAs listed had a logFC = 0, PValue = 1, and FDR = 1. I have run multiple other data sets with CIRIquant and have never had this issue before.

Is it possible that either there is a limitation on the number of samples that can be parsed, or that there is an issue with the filtering step in the DE pathway that let these non-significant circRNA through?

Thank you for your time & any suggestions!

Kevinzjy commented 1 year ago

Hi @mlvanhorn , CIRIquant uses the edgeR package for DE analysis and there is no sample number limitation.

It's likely that many circRNAs are only expressed in a few samples, so the average expression is zero in both groups. At the same time, the large number of circRNAs means it's hard for DE circRNAs to pass FDR correction. Thus, I suggest you filter the de results based on occurrence and expression level to reduce the total number of circRNAs, then perform multiple tests to re-calculate the FDR values.