Open Emi-sed opened 3 years ago
If the repeat length is very close or above the read length (e.g. ~50 or more CAG repeats in case of 150 bp reads) then these reads may be misaligned to another chromosome and if you run EH only on the chr12 then those misaligned reads can't be used when calculating the repeat length. In that case, yes, you would need to run EH on the whole BAM file to get the best repeat length estimation.
Egor can correct me if I'm wrong!
Thank you very much for your reply. Do you mean that ExpansionHunter uses other chromosomes' bam files when EH calculates the number of repeats? Could you show me the basis for that from an article or something? I'd like to read it very much.
Sincerely, Emi
That's exactly right, Andreas!
Emi: EH extracts mates of reads aligned close to the repeat, even if those mates are located on other chromosomes. Here is a quick cartoon illustrating this.
When a BAM file was split by chromosome, EH no longer can recover such reads and hence can produce an incorrect size estimate (as Andreas pointed out).
Does this answer your question? Please let me know if you have any follow up questions.
Best wishes, Egor
Thank you very much!! I was able to figure it out because of your answers. EH is a unique tool. We will use a bam file that contains all chromosomes from now on.
Sincerely, Emi
Glad we could help! Please don't hesitate to reach out if you run into any other issues!
Hi,
When I used ExpansionHunter with a bam file which is separated by chromosome, the number of repeats became shorter than with a bam file that contains all chromosomes. For example, when chr12.sorted.bam was used, the number of ATN1 repeats was 40. When all_chr.merged.sorted.bam was used, the number of ATN1 repeats was 55. When we use ExpansionHunter, do we need bam files that contain all chromosomes? Could you let me know your thoughts?
Sincerely, Emi