GoekeLab / m6anet

Detection of m6A from direct RNA-Seq data
https://m6anet.readthedocs.io/
MIT License
104 stars 19 forks source link

PAR_Y or not #45

Closed MiphaZ closed 2 years ago

MiphaZ commented 2 years ago

Hi Developer!

In your latest version , some transcripts' id include both PAR_Y and normal version . But in old version , those messages were cut off . So what does the old id mean ?

Old version: ENST00000381192 166 29 0.08692147 Latest version: ENST00000381192.10|ENSG00000002586.20|OTTHUMG00000021073.12|OTTHUMT00000055624.3|CD99-205|CD99|1129|protein_coding| 220 20 0.103975390 TGACT 0.05000000 ENST00000381192.10_PAR_Y|ENSG00000002586.20_PAR_Y|OTTHUMG00000021073.12|OTTHUMT00000055624.3|CD99-205|CD99|1129|protein_coding| 226 20 0.279130700 TGACT 0.10000000

chrishendra93 commented 2 years ago

hi @MiphaZ , may I know what is the typical entry of the contig column in your nanopolish eventalign.txt? This might be related to an incorrect string split on the annotation results

MiphaZ commented 2 years ago

Sure.

ENST00000455464.7|ENSG00000237094.12|OTTHUMG00000002857.7|OTTHUMT00000346907.3|ENST00000455464|ENSG00000237094|902|processed_transcript|

MiphaZ commented 2 years ago

What's more, is it reasonable using a old version m6anet-dataprep to run new m6anet-run_inference?

chrishendra93 commented 2 years ago

hi @MiphaZ, sorry for the late reply as I was travelling until recently.

Also I think the PAR_Y comes from your annotation files. The older version of m6Anet will split the "." in the contig column so that all transcripts such as ENSTXXX.Y will be parsed as ENSTXXX. We have removed this functionality in the newer version so as to be consistent with the annotations used

Also, it is reasonable to use the old version of m6anet-dataprep to run m6anet-run_inference but just be aware that there was a minor bug in older version that will exclude a tiny portion of the candidate sites compared to the new version

MiphaZ commented 2 years ago

It helps a lot ,thank you.