elzbth / jitterbug

Jitterbug is a bioinformatic software that predicts insertion sites of transposable elements in a sample sequenced by short paired-end reads with respect to an assembled reference.
17 stars 8 forks source link

zygosity calling #11

Open clemgoub opened 7 years ago

clemgoub commented 7 years ago

Hello,

It is not clear for me how the zygosity is called. I have now processed and filtered my samples following the documentation. I also performed a joint call to recovered previously discarded loci which turned out to work really well. However, I do not understand the zygosity calling process. All the detected insertions having a softclip support are called with a het_core_reads=0 (regardless the softclip support) and a zygosity of 1.000 and all the insertion with no softclip support (recovered insertions) have a het_core_reads=-1 and zygosity=-1.000

Can you give me more information about this calling process and the meaning of these metrics?

Thanks a lot,

Clément

clemgoub commented 7 years ago

Hi again! Does anyone have an idea about it? Thanks a lot!

mbosio85 commented 7 years ago

Hi,

from the paper:

Zygozity estimation using reference-like and clipped reads If the exact position for a TEI has been determined by clipped read signature, the original bam file is queried for all reads that overlap this position. Those that are properly mapped (bitwise flag 0x2 in SAM specification) and overlap the insertion site with five or more nucleotides on each side (termed core-reads) indicate the presence of a reference allele. For each given TEI, zygosity (or variant allele frequency) is calculated as . clippedreads/(clippedreads+corereads)

From the code if you want to dig into the details: in ClusterPair.py the function is called calc_zygosity(self,reads)

hope it helps

Mattia

clemgoub commented 7 years ago

Hi Mattia,

Thanks for your answer that makes a lot of sense. However I am confused now because it seems that anytime I have a call supported by clipped reads, I have always 0 core reads reported and thus a zygosity of 1.000. However, when I look in details at some of these insertions, I can clearly see some properly mapped paired-end reads overlapping the clipped reads (designated "core-reads" if I understand clearly), suggesting heterozygous inserts. Is it possible that this step is somehow slipped during the calls?

I included a snapshot of IGV for more clarity, and copied the corresponding line of the original call by jitterbug. Thanks for your help!

chr3 jitterbug TE_insertion 110694466 110694598 . . . supporting_fwd_reads=34; supporting_rev_reads=26; cluster_pair _ID=7976; lib=None; Inserted_TE_tags_fwd=AluSp, AluSz; Inserted_TE_tags_rev=AluSp, AluSx, AluSc8; fwd_cluster_span=294; rev_cluster_span=177; softclipped_pos= (110694543, 110694551); softclipped_support=14; het_core_reads=0; zygosity=1.000; predicted_superfam=AluSp

igv_capture