Shians / NanoMethViz

Apache License 2.0
21 stars 2 forks source link

Suggestion of improvement: parse Mm/MM and Ml/ML tags and throw warning #35

Closed alanlorenzetti closed 9 months ago

alanlorenzetti commented 9 months ago

Hi. Thanks for implementing such a great software with so many capabilities.

While using NanoMethViz to parse my modBAM files, I noticed that my datasets have been generated using different versions of Megalodon. One of the versions, the oldest, reports Mm and Ml tags. The newest, reports the novel standard: MM and ML.

I know the SAM guidelines now enforce the uppercase version. Therefore, I adjusted my files to be able to use NanoMethViz since parsing of Mm/Ml was failing using the query_methy function. But it would be nice if NanoMethViz could recognize and maybe parse Ml and Mm tags as well, like in mbtools: https://github.com/jts/mbtools/blob/1cff0037c67bbc5213acdfda303d90cb7d85071d/src/main.rs#L122C5-L122C5

A warning should be thrown if any of these tags were encountered, telling the user to adjust the tags or update their software before generating modBAM files.

This could be implemented here: https://github.com/Shians/NanoMethViz/blob/8de93d76359d7b148225393504a6a00f700ff55a/R/modbam.R#L100

This is a minor suggestion, but it could save users some time on troubleshooting.

Again, thanks for the wonderful work you have done here.

Best, Alan.

Shians commented 9 months ago

Thanks for the suggestion. I think it's should be an easy fix that I'll implement soon, however I can't guarantee its functionality until I have some test data. Would you mind sending a small inconsequential part of your BAM file (100 or so reads) for me to test with?

alanlorenzetti commented 9 months ago

Sounds good! Unfortunately, I cannot send you my data until the study is published, but I will try to come up with a toy example using public data for your test. Thanks!

alanlorenzetti commented 9 months ago

I downloaded public data from SRA: https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR11047861&display=metadata GM12878; MinION, 10 gene Panel multiCut + BRCA1, fast5 tgz (SRR11047861)

This is the source study: https://www.nature.com/articles/s41587-020-0407-5

I generated modBAM files using Megalodon 2.4.1 and Guppy 4.5.2. Please, find the files below and let me know when you finish downloading them: https://drive.google.com/drive/folders/1DCNCTbfEYLtGXnpfa0YCce_wrKmlGe0m?usp=share_link

The modBAM file output by Megalodon contains Mm and Ml tags:

Therefore, I converted the tags to MM and ML by using regular expressions:

A potential locus of interest to visualize is within BRCA1 (chr17:43,051,446-43,115,590) (hg38).

I hope this can be helpful.

Best, Alan.

Shians commented 9 months ago

Thanks for that, I think it'd be possible for me to simply edit the tags in my own test BAM files to lower case to test. Assuming that's the only significant difference in the outputs from older versions of Megalodon.

Shians commented 9 months ago

I've implemented experimental support for lower case modbam tags in f366823.

Currently it will attempt to read both upper and lower case tags, discarding the lower case if it is not present. Let me know if it works for you.

alanlorenzetti commented 9 months ago

Thank you so much! I can confirm this commit fixed the issue for the two specific cases I tested: my data and public data.