AdamaJava / adamajava

Other
15 stars 5 forks source link

Difficulty Extracting Telomere Length Data from qmotif XML Output #351

Open CHOWDHURY098 opened 7 months ago

CHOWDHURY098 commented 7 months ago

I am writing to seek guidance regarding the quantification of telomeres using qmotif. Despite successfully deploying qmotif for telomere quantification with an HG38-aligned BAM file and obtaining the required output files, including a telomere BAM file and an XML file, I am encountering difficulty in extracting telomere lengths in kilobases (kb) from the XML file. Could you kindly provide clarification on how to extract telomere length data from the XML file?

my XML file like this : "<?xml version="1.0" encoding="UTF-8" standalone="no"?>

.........................................................................................................."
holmeso commented 7 months ago

We don’t normalise directly to genome coverage. Rather, we simply scale to a nominal read count of 1B reads to allow for simple comparisons between BAMs with different numbers of reads. So if your BAM has 0.5B reads, all of the scaled scores will be double the raw counts and if your BAM has 2B reads, the scaled scores would be half of the raw numbers. We don’t take any account of unmapped reads, secondary alignments etc when scaling, we just count every read. We take this simple approach because when you are talking about tumours, the correct approach is non-obvious - for example, if we have 3 chromosomes with whole-arm amplifications, how should we take account of that? Clever/correct scaling is left as an exercise for the user as they know their data best. With all of those caveats, qMotif scaled scores correlate very well with wet-lab techniques as we showed in the qMotif paper so we think the simple scaling approach probably works well enough in the majority of cases.