TF-Chan-Lab / miRDeep-P2_pipeline

GNU General Public License v3.0
5 stars 1 forks source link

miRDP2_mature.fa #7

Open aayoungfish opened 8 months ago

aayoungfish commented 8 months ago

Thank you so much for developing this software! I've already got the annotation files (miRDP2_mature_known.txt and miRDP2_mature_variant.txt) and miRDP2_mature.fa with the step1&2. I have more than 10,000 miRDP2_mature.fa, but only more than 100 are known miRNA, is this normal? whether miRDP2_mature.fa is a predicted miRNA file (including all known and predicted, which already has most of the overlap with the miRbase miRNA in the database)? I found that there is another one in the folder filterP prediction file, which is the final predicted novel_miRNA file? Looking forward to your reply!

alanlamsiu commented 8 months ago

Hi @aayoungfish,

I would like to point out that according to instructions from the miRDeep-P2 manuscript written by the developers, the _filter_P_ prediction file is the final output that is usable for downstream analyses. In this pipeline, we take a step back and use the _predictions, which is a greedy move to obtain a larger list of sequences to work with.

As you have seen in your data, it is also what we observe that there can be thousands of sequences coming from this pipeline, while only a few hundreds are known miRNA. But we think it will be useful to keep those miRNA variants and novel miRNAs up to this step. In latter steps when miRNA expression quantification is performed, it is always possible to filter out those with low levels, which will reduce the number greatly.