Illumina / ExpansionHunter

A tool for estimating repeat sizes
Other
174 stars 53 forks source link

bam which is separated by chromosome #148

Open Emi-sed opened 2 years ago

Emi-sed commented 2 years ago

Hi,

When I used ExpansionHunter with a bam file which is separated by chromosome, the number of repeats became shorter than with a bam file that contains all chromosomes. For example, when chr12.sorted.bam was used, the number of ATN1 repeats was 40. When all_chr.merged.sorted.bam was used, the number of ATN1 repeats was 55. When we use ExpansionHunter, do we need bam files that contain all chromosomes? Could you let me know your thoughts?

Sincerely, Emi

andreasssh commented 2 years ago

If the repeat length is very close or above the read length (e.g. ~50 or more CAG repeats in case of 150 bp reads) then these reads may be misaligned to another chromosome and if you run EH only on the chr12 then those misaligned reads can't be used when calculating the repeat length. In that case, yes, you would need to run EH on the whole BAM file to get the best repeat length estimation.

Egor can correct me if I'm wrong!

Emi-sed commented 2 years ago

Thank you very much for your reply. Do you mean that ExpansionHunter uses other chromosomes' bam files when EH calculates the number of repeats? Could you show me the basis for that from an article or something? I'd like to read it very much.

Sincerely, Emi

egor-dolzhenko commented 2 years ago

That's exactly right, Andreas!

Emi: EH extracts mates of reads aligned close to the repeat, even if those mates are located on other chromosomes. Here is a quick cartoon illustrating this.

example

When a BAM file was split by chromosome, EH no longer can recover such reads and hence can produce an incorrect size estimate (as Andreas pointed out).

Does this answer your question? Please let me know if you have any follow up questions.

Best wishes, Egor

Emi-sed commented 2 years ago

Thank you very much!! I was able to figure it out because of your answers. EH is a unique tool. We will use a bam file that contains all chromosomes from now on.

Sincerely, Emi

egor-dolzhenko commented 2 years ago

Glad we could help! Please don't hesitate to reach out if you run into any other issues!