differential m6A methylation between sample groups

eltonjrv commented 1 month ago

Dear m6A developers.

I wonder if you can advise on the best statistics to perform a differential m6A abundance analysis between two sample groups. I've already successfully run m6anet v2.1 on my ONT dRNA-seq samples, and have figured out my own way to extract m6a-containing readCounts per transcript per sample by scrutinizing both data.site_proba.csv and data.indiv_proba.csv output files. I'm now stuck on which best statistics method to rely on for calculating differential m6A abundance between my sample groups.

Thanks in advance for any light shed, Best, Elton

yuukiiwa commented 1 month ago

Hi Elton (tagging you here @eltonjrv),

Our lab has another tool called xpore, which detects differentially modified sites from two or more samples. You can run xpore with your two samples. It should output a diffmod.table, which tells which sites significantly differ between the samples based on z-score and p-value.

One thing I find helpful is to combine the m6anet data.site_proba.csv for the two samples with the diffmod.table, which gives more information for each site.

Thanks!

Best wishes, Yuk Kei

eltonjrv commented 1 month ago

Hi Yuk, Thanks for your reply. However, I reckon that xpore is applicable to KO-vs-WT cases, like other differential methylation tools (diffErr, nanoPsiPy, nanoSPA). In my current case (de novo m6A identification), I don't have KO mutants for the m6A writer enzyme, and that's the reason why I chose m6anet, which seems to be working nicely. Thanks again! My only doubt now is on the best analytical way to compare transcripts' differential methylation between two sample conditions from their m6anet-denovo-identified m6A sites. Got it?

The following is how I am currently rescuing m6A-containing readCounts from transcripts showing m6A sites with >80% confidence by m6anet: ### 1) sed 's/,/\t/g' data.site_proba.csv | grep -v '^transcript' | awk '{if($4 >= 0.8) print $0}' | cut -f 1,2 | sed 's/\t/,/g' >x 2) perl -e 'open(FILE, "x"); open(FILE2, "data.indivproba.csv"); while() {chomp($); $hash{$} = 1;} while() {chomp($); @array = split(/,/, $_); $idPos = "$array[0],$array[1]"; if($hash{$idPos} == 1) {print("$array[0]\t$array[2]\n");} }' >y 3) sort -u y | cut -f 1 | uniq -c | sed -r 's/^ +//g' | sed 's/ /\t/g' >m6Acounts.tab ### Then I combine all individual m6Acounts.tab files (3 replicates for sample group 1, and 3 replicates for sample group 2) in a single readCount matrix to be served as input for the differential stats tool. I've tested that on DESeq2 and the results are not too bad. However, I still have no clue on the reliability of this approach. My collaborators would need to validate that in the lab for some transcripts.

I wonder if you or anyone else from the m6anet team would be able to advise on that. Again, I'd like to stick with m6anet de novo identification results and perform diff methylation analysis.

Many thanks in advance for your attention, Best, Elton

yuukiiwa commented 1 month ago

Hi Elton (tagging you here @eltonjrv),

xpore doesn't require an unmodified control, as stated in xpore's paper. For demonstration purposes, the xpore paper's analyses were mainly on KO and KD of METTL3. If DESeq2 gives you good results, you can proceed with it, but we do suggest using xpore for differential modification detection.

Thanks!

Best wishes, Yuk Kei

GoekeLab / m6anet

differential m6A methylation between sample groups #173