ListerLab / HOME

DMR Identification Tool
33 stars 78 forks source link

All Comb1-n "NA" for most DMRs in CHH #34

Open mkpython3 opened 3 years ago

mkpython3 commented 3 years ago

Hey, i am using HOME-timeseries with 23 Samples (no replicates) and i just noticed that most of my DMRs that got reported by HOME have "NA" in every Comb1-n column, even though max_delta and confidence_scores are reported.

An example looks like this:

chr start   end numC    len max_delta   confidence_scores   HvBS001_VS_HvBS002  HvBS001_VS_HvBS004  HvBS001_VS_HvBS005  HvBS001_VS_HvBS006  HvBS001_VS_HvBS007  HvBS001_VS_HvBS008  HvBS001_VS_HvBS009  HvBS001_VS_HvBS011  HvBS001_VS_HvBS012  HvBS001_VS_HvBS013  HvBS001_VS_HvBS014  HvBS001_VS_HvBS015  HvBS001_VS_HvBS016  HvBS001_VS_HvBS017  HvBS001_VS_HvBS018  HvBS001_VS_HvBS019  HvBS001_VS_HvBS020  HvBS001_VS_HvBS021  HvBS001_VS_HvBS022  HvBS001_VS_HvBS023  HvBS001_VS_HvBS024  HvBS001_VS_HvBS048  HvBS002_VS_HvBS004  HvBS002_VS_HvBS005  HvBS002_VS_HvBS006  HvBS002_VS_HvBS007  HvBS002_VS_HvBS008  HvBS002_VS_HvBS009  HvBS002_VS_HvBS011  HvBS002_VS_HvBS012  HvBS002_VS_HvBS013  HvBS002_VS_HvBS014  HvBS002_VS_HvBS015  HvBS002_VS_HvBS016  HvBS002_VS_HvBS017  HvBS002_VS_HvBS018  HvBS002_VS_HvBS019  HvBS002_VS_HvBS020  HvBS002_VS_HvBS021  HvBS002_VS_HvBS022  HvBS002_VS_HvBS023  HvBS002_VS_HvBS024  HvBS002_VS_HvBS048  HvBS004_VS_HvBS005  HvBS004_VS_HvBS006  HvBS004_VS_HvBS007  HvBS004_VS_HvBS008  HvBS004_VS_HvBS009  HvBS004_VS_HvBS011  HvBS004_VS_HvBS012  HvBS004_VS_HvBS013  HvBS004_VS_HvBS014  HvBS004_VS_HvBS015  HvBS004_VS_HvBS016  HvBS004_VS_HvBS017  HvBS004_VS_HvBS018  HvBS004_VS_HvBS019  HvBS004_VS_HvBS020  HvBS004_VS_HvBS021  HvBS004_VS_HvBS022  HvBS004_VS_HvBS023  HvBS004_VS_HvBS024  HvBS004_VS_HvBS048  HvBS005_VS_HvBS006  HvBS005_VS_HvBS007  HvBS005_VS_HvBS008  HvBS005_VS_HvBS009  HvBS005_VS_HvBS011  HvBS005_VS_HvBS012  HvBS005_VS_HvBS013  HvBS005_VS_HvBS014  HvBS005_VS_HvBS015  HvBS005_VS_HvBS016  HvBS005_VS_HvBS017  HvBS005_VS_HvBS018  HvBS005_VS_HvBS019  HvBS005_VS_HvBS020  HvBS005_VS_HvBS021  HvBS005_VS_HvBS022  HvBS005_VS_HvBS023  HvBS005_VS_HvBS024  HvBS005_VS_HvBS048  HvBS006_VS_HvBS007  HvBS006_VS_HvBS008  HvBS006_VS_HvBS009  HvBS006_VS_HvBS011  HvBS006_VS_HvBS012  HvBS006_VS_HvBS013  HvBS006_VS_HvBS014  HvBS006_VS_HvBS015  HvBS006_VS_HvBS016  HvBS006_VS_HvBS017  HvBS006_VS_HvBS018  HvBS006_VS_HvBS019  HvBS006_VS_HvBS020  HvBS006_VS_HvBS021  HvBS006_VS_HvBS022  HvBS006_VS_HvBS023  HvBS006_VS_HvBS024  HvBS006_VS_HvBS048  HvBS007_VS_HvBS008  HvBS007_VS_HvBS009  HvBS007_VS_HvBS011  HvBS007_VS_HvBS012  HvBS007_VS_HvBS013  HvBS007_VS_HvBS014  HvBS007_VS_HvBS015  HvBS007_VS_HvBS016  HvBS007_VS_HvBS017  HvBS007_VS_HvBS018  HvBS007_VS_HvBS019  HvBS007_VS_HvBS020  HvBS007_VS_HvBS021  HvBS007_VS_HvBS022  HvBS007_VS_HvBS023  HvBS007_VS_HvBS024  HvBS007_VS_HvBS048  HvBS008_VS_HvBS009  HvBS008_VS_HvBS011  HvBS008_VS_HvBS012  HvBS008_VS_HvBS013  HvBS008_VS_HvBS014  HvBS008_VS_HvBS015  HvBS008_VS_HvBS016  HvBS008_VS_HvBS017  HvBS008_VS_HvBS018  HvBS008_VS_HvBS019  HvBS008_VS_HvBS020  HvBS008_VS_HvBS021  HvBS008_VS_HvBS022  HvBS008_VS_HvBS023  HvBS008_VS_HvBS024  HvBS008_VS_HvBS048  HvBS009_VS_HvBS011  HvBS009_VS_HvBS012  HvBS009_VS_HvBS013  HvBS009_VS_HvBS014  HvBS009_VS_HvBS015  HvBS009_VS_HvBS016  HvBS009_VS_HvBS017  HvBS009_VS_HvBS018  HvBS009_VS_HvBS019  HvBS009_VS_HvBS020  HvBS009_VS_HvBS021  HvBS009_VS_HvBS022  HvBS009_VS_HvBS023  HvBS009_VS_HvBS024  HvBS009_VS_HvBS048  HvBS011_VS_HvBS012  HvBS011_VS_HvBS013  HvBS011_VS_HvBS014  HvBS011_VS_HvBS015  HvBS011_VS_HvBS016  HvBS011_VS_HvBS017  HvBS011_VS_HvBS018  HvBS011_VS_HvBS019  HvBS011_VS_HvBS020  HvBS011_VS_HvBS021  HvBS011_VS_HvBS022  HvBS011_VS_HvBS023  HvBS011_VS_HvBS024  HvBS011_VS_HvBS048  HvBS012_VS_HvBS013  HvBS012_VS_HvBS014  HvBS012_VS_HvBS015  HvBS012_VS_HvBS016  HvBS012_VS_HvBS017  HvBS012_VS_HvBS018  HvBS012_VS_HvBS019  HvBS012_VS_HvBS020  HvBS012_VS_HvBS021  HvBS012_VS_HvBS022  HvBS012_VS_HvBS023  HvBS012_VS_HvBS024  HvBS012_VS_HvBS048  HvBS013_VS_HvBS014  HvBS013_VS_HvBS015  HvBS013_VS_HvBS016  HvBS013_VS_HvBS017  HvBS013_VS_HvBS018  HvBS013_VS_HvBS019  HvBS013_VS_HvBS020  HvBS013_VS_HvBS021  HvBS013_VS_HvBS022  HvBS013_VS_HvBS023  HvBS013_VS_HvBS024  HvBS013_VS_HvBS048  HvBS014_VS_HvBS015  HvBS014_VS_HvBS016  HvBS014_VS_HvBS017  HvBS014_VS_HvBS018  HvBS014_VS_HvBS019  HvBS014_VS_HvBS020  HvBS014_VS_HvBS021  HvBS014_VS_HvBS022  HvBS014_VS_HvBS023  HvBS014_VS_HvBS024  HvBS014_VS_HvBS048  HvBS015_VS_HvBS016  HvBS015_VS_HvBS017  HvBS015_VS_HvBS018  HvBS015_VS_HvBS019  HvBS015_VS_HvBS020  HvBS015_VS_HvBS021  HvBS015_VS_HvBS022  HvBS015_VS_HvBS023  HvBS015_VS_HvBS024  HvBS015_VS_HvBS048  HvBS016_VS_HvBS017  HvBS016_VS_HvBS018  HvBS016_VS_HvBS019  HvBS016_VS_HvBS020  HvBS016_VS_HvBS021  HvBS016_VS_HvBS022  HvBS016_VS_HvBS023  HvBS016_VS_HvBS024  HvBS016_VS_HvBS048  HvBS017_VS_HvBS018  HvBS017_VS_HvBS019  HvBS017_VS_HvBS020  HvBS017_VS_HvBS021  HvBS017_VS_HvBS022  HvBS017_VS_HvBS023  HvBS017_VS_HvBS024  HvBS017_VS_HvBS048  HvBS018_VS_HvBS019  HvBS018_VS_HvBS020  HvBS018_VS_HvBS021  HvBS018_VS_HvBS022  HvBS018_VS_HvBS023  HvBS018_VS_HvBS024  HvBS018_VS_HvBS048  HvBS019_VS_HvBS020  HvBS019_VS_HvBS021  HvBS019_VS_HvBS022  HvBS019_VS_HvBS023  HvBS019_VS_HvBS024  HvBS019_VS_HvBS048  HvBS020_VS_HvBS021  HvBS020_VS_HvBS022  HvBS020_VS_HvBS023  HvBS020_VS_HvBS024  HvBS020_VS_HvBS048  HvBS021_VS_HvBS022  HvBS021_VS_HvBS023  HvBS021_VS_HvBS024  HvBS021_VS_HvBS048  HvBS022_VS_HvBS023  HvBS022_VS_HvBS024  HvBS022_VS_HvBS048  HvBS023_VS_HvBS024  HvBS023_VS_HvBS048  HvBS024_VS_HvBS048
chr1H   26732   26836   7   104 0.16683316683316682 0.25653578517793324 NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA

Is this expected behavior? How can this happen?

I am happy to provide more information if necessary. In the meantime i will check if this happened also for other Contexts. The input files are about 273M (uncompressed), so let me know how to submit them if they are needed. Thanks alot in advance.

Akanksha2511 commented 2 years ago

Hi,

Sorry for the delay in reply. NA implies that the status (hyper/hypo) of the DMRs could not be interpreted. I would suggest visualisation of few DMRs. But, if all the status is "NA" I would say it does not have enough power to identify the DMRs with confidence.

Thanks, Akanksha

mkpython3 commented 2 years ago

Hey, thank you for your response.

So do I understand it correctly: HOME first discovers a potential DMR in that region. But then it does not find enough confidence to report it. The DMR will be written to the output file anyways but with all columns NA. I just think it is a little weird since it only happened in the CHH context and a confidence score is still reported. Maybe the DMR should not appear in the output at all then?

Best regards Marius

Akanksha2511 commented 2 years ago

Hi Marius,

Yes, so for time series data, HOME will try to find the difference between all the samples and will report the region of difference and its confidence (delta). It then checks for a consistent delta but is same direction (hyper/hypo) for possible pairwise comparisons. If its not able to find that it will report NA. You are right its a bit weird to get 'NAs' in all the samples. I guess its just telling you that the data does not have enough coverage for CHH methylation identification. Have to tried to visualise these DMRs on IGV or UCSC browser?

Thanks, Akanksha

mkpython3 commented 2 years ago

Hey Akanksha

I have visualized a few DMRs now using a Python script. For some of the "All NA" DMRs it is quite obvious why they are NA because there is rarely any data in the range of the DMR. But for others i think there should be enough data that coverage should not be an issue. I have attached a few plots below. Here the size of the dots represents the coverage and the color (blue to pink) the level of methylation.

These are the plots of DMRs that have >=50% of data in the Comb1-N columns:

good_84738_84835 good_34256_34621

These are DMRs where there is 100% NA in every column like in the first post I made, where I think it should be possible to call a DMR:

bad_151647_151877 bad_194931_196382

This is one of the 100% NA DMRs where I think its quite obvious why there is so much NA, but I just think it maybe should not be outputted at all:

bad_26732_26836

Thanks alot for your help! Marius

Akanksha2511 commented 2 years ago

Hi Marius,

Thanks for sharing the plots. Its hard to visualise it like this. I think it will better to have a IGV or a UCSC plot in the standard format which will also show the direction of methylation. Please have a look at the HOME paper for reference.

Thanks, Akanksha

mkpython3 commented 2 years ago

Hey Akanksha,

I managed to visualize the methylation levels in IGV. However, I don't know how to reproduce the plots with the methylation difference like in the HOME paper. And btw, what do you mean with "direction of methylation". Do you include an extra track with the DMR information? Because the methylation files are only 0 - 1 and not -1 - 1. Or is there an option in IGV to produce a methylation difference track from the others?

Best regards, Marius

Akanksha2511 commented 2 years ago

Hi Marius,

You can upload the wig files from BSseeker2 for the samples and visualise the DMRs on IGV. I think this should give us the idea about what's the issue.

Thanks, Akanksha

mkpython3 commented 2 years ago

Hey Akanksha,

thank you for your response. I am using Bismark for my analysis and merged the + and - strand for all chromosomes, therefore i have bedgraph files that go from 0 to 1 instead of wig files that go from -1 to 1. Anyways it should not make a huge difference. I converted those bedgraph files to .tdf files for igv and made a screenshot of a 100% NA DMR. I hope this helps? Screenshot from 2021-11-30 16-59-35

Best regards, Marius