GoekeLab / m6anet

Detection of m6A from direct RNA-Seq data
https://m6anet.readthedocs.io/
MIT License
108 stars 19 forks source link

Question about --read_proba_threshold in m6anet inference #177

Open cxy-26 opened 1 month ago

cxy-26 commented 1 month ago

Hi,

I am working on Nanopore direct RNA-seq RNA004 chemistry. I have a question about the --read_proba_threshold in m6anet inference step. I tried the threshold with default value (0.033379376), 0.05 and 0.5 but there is no difference in the output data.site_proba.csv datasets.

m6anet inference --input_dir ./ --out_dir ./ --pretrained_model HEK293T_RNA004 --n_processes 16 --num_iterations 1000 m6anet inference --input_dir ./ --out_dir ./threshold_0.05/ --pretrained_model HEK293T_RNA004 --n_processes 16 --num_iterations 1000 --read_proba_threshold 0.05 m6anet inference --input_dir ./ --out_dir ./threshold_0.5/ --pretrained_model HEK293T_RNA004 --n_processes 16 --num_iterations 1000 --read_proba_threshold 0.5

I expected a decrease of mod_ratio as the threshold increases since the mod_ratio column is calculated by thresholding the probability_modified from data.indiv_proba.csv based on the --read_proba_threshold parameter during m6anet inference call.

Summary of mod_ratio with default threshold: image

Summary of mod_ratio with threshold=0.05: image

Summary of mod_ratio with threshold=0.5: image

I am wordering whether this --read_proba_threshild worked or not. Or I misunderstood the way it calulates the mod_ratio.

yuukiiwa commented 6 days ago

Hi @cxy-26,

mod_ratio is the number for read with read-level probabilities over the --read_proba_threshold. From our experience, changing the --read_proba_threshold changes the mod_ratio.

Thanks!

Best wishes, Yuk Kei

baibhav-bioinfo commented 6 days ago

hi @cxy-26, i have a query, which if you can clarify would help me a lot.

i also have DRS reads sequenced with RNA004 chemistry for a plant species (~15 million reads/sample). So, as the new m6Anet inference model based on RNA004 is only trained on human cell line, can i still use the RNA002 based arabidopsis model for m6A prediction in my case (plant species)?

i did use the plant model (RNA002), but the results shows very low (~3000) number of m6A sites per sample. I ran with default threshold (0.0032978046219796 for plant).

cxy-26 commented 6 days ago

Hi @baibhav-bioinfo,

I am sorry that I can't answer your question since all my samples are human cell line. Have you tried to use RNA004 human cell line model for your plant sample or to train your own model?

And I guess you at the wrong person?

Best

cxy-26 commented 6 days ago

Hi @cxy-26,

mod_ratio is the number for read with read-level probabilities over the --read_proba_threshold. From our experience, changing the --read_proba_threshold changes the mod_ratio.

Thanks!

Best wishes, Yuk Kei

Thanks for your explaination. I'll doublecheck my code and results.

baibhav-bioinfo commented 6 days ago

Thanks for the response and also apologies for the bothering. i did ask the developers, they might have missed my issue or not able to fully comprehend my issue. So, i am asking you as you are also using the DRS in first hand.

what would you suggest will be best in my case? using RNA004 human model or RNA002 based plant model? is there any option i can train my own model? how does that work