Open elcortegano opened 1 year ago
Thanks @elcortegano , I just commented on the IGV ticket -- in summary the HiFi reads always contain methylation information in insertions if they've had primrose 5mCpG calling applied to them (this process occurs on unaligned reads). For the pileup tools, they do not provide an output for insertions today. In the default denovo
mode for aligned_bam_to_cpg_tools
, we could provide a summary for insertions in theory, but there are a few practical barriers, the first being how to communicate these results in a standardized file format. Neither bed nor bigwig output would support this. If you have an output format and corresponding viewer in mind, we could look into supporting it.
Thanks for all the clarifications @ctsa ! specially for the input on the 5mCpG calls in the BAMs.
Regarding the issue here with the pileup methods, I understand the difficulties in adding insertions data. The only standard format I could think of for adding that data would look like a VCF, with the different modification probabilities in the FORMAT field (?).
I think that could be useful. I'll admit that for my personal uses right now it would be fine having IGV visualization for 5mC sites. However, relying on IGV visualization is not scalable, so I think other users might benefit of a feature like this in the future.
Okay that sounds good. I'll be in touch with Jim if we can help with the IGV visualization at the read level. For the pileup we'll look out for a way to do this, perhaps VCF or another format will make this doable in future.
same requirement as @elcortegano, HiFi data is beneficial for insertion identification and the 5mC methylation status may be highly related with these insertions, the support for 5mC of insertion sequence is desired.
Following a recent issue open in the IGV site (https://github.com/igvteam/igv/issues/1303), we wonder if
aligned_bam_to_cpg_scores.py
would call 5mC methylated CpG sites at insertions (i.e. at sites that are not in the reference).I run primrose in my data on the CCS reads, and then 5mC sites were called from a pbmm2 aligned BAM similarly as: