data.result & biological replicates

Hi @chrishendra93

I performed the latest m6anet on all my samples, and the results are quite useful for me to design further experiments and compare the results to other software.

However, I had some questions:

minimum read count threshold (in m6anet-run_inference):

We had sequenced 6 Nanopore DRS libraries and obtained 2 ~ 2.5 million "aligned reads" for each library. The median number of aligned reads is about 25 (aligned reads per gene) in our samples. So, in our case, about half of the expressed genes would be directly excluded from the final results under the default criteria, just because the aligned reads at these genes were less than 20. It would cause some bias in interpreting the information of transcriptome-wide m6A sites, only the sites in "abundant genes" could pass the threshold. (The problem might be easily solved by improvement of the throughput in the future)

Further, the "aligned reads" is largely affected by the throughput of libraries. If gene_A1 has 21 reads in Replicate.1; 18 reads in Replicate.2; 19 reads in Replicate.3. It's obvious that only the sites in Replicate.1 would pass the threshold, while, all the other m6A sites in Replicate.2 & Replicate.3 would be lost in the final results. (We had encounter such an issue for some critical genes)

I had read the issue of #13, and know it's hard to implement such a setting due to the model were trained ready for "minimum read count threshold = 20". So, is it possible (or is it proper?) to take all the biological replicates into account at the same time? (ex: All reads in gene_A = 21+18+19, then using these 58 reads for analysis)

DRACH motif

In mammalians, DRACH motif is the most conserved consensus sequence of m6A site, however, "RRACH" motif is announced to be the most in plants.

So, it would be great for plant biologists (like me) if there's a column for recording the type of motif (GGACA, AAACT, etc) in data.result.csv.

Feel free to let me know if the questions above are not reasonable.

Many thanks

YCCHEN

GoekeLab / m6anet

data.result & biological replicates #25