ekg / hhga

haplotypes genotypes and alleles example decision synthesizer
MIT License
20 stars 3 forks source link

indel representation is wrong #21

Closed ekg closed 8 years ago

ekg commented 8 years ago

Found a "small"!! problem with the indel representation.

hhga -f ~/ref/hs37d5.fa -b 1:3675000-3676000.fn.vial1.bam -v lodz3.fn.clean.vcf.gz -r 1:3675803-3676000  -w 16 | sed s/\|/\\n\|/g | column  | cut -f 1,9-14 -d\  | sed 's/ / ... /' | column -t
|ref           ...  8T:1    9C:1      10M:1            11M:1      12M:1   13M:1
|hap1          ...  8M:1    9R:1      10M:1            11M:1      12M:1   13M:1
|hap2          ...  8M:1    9R:1      10AATTATTTAAA:1  11M:1      12M:1   13M:1
|geno1         ...  8M:1    9R:1      10M:1            11M:1      12M:1   13M:1
|geno2         ...  8M:1    9R:1      10AATTATTTAAA:1  11M:1      12M:1   13M:1
|aln0          ...  8R:32   9R:42     10A:42           11A:42     12T:42  13T:42
|aln1          ...  8R:37   9R:42     10A:42           11A:32     12T:32  13T:42
|aln2          ...  8R:42   9R:32     10A:32           11A:11     12T:32  13T:11
|aln3          ...  8R:42   9R:42     10A:42           11A:42     12T:42  13T:42
|aln4          ...  8R:42   9R:42     10A:42           11A:42     12T:37  13T:42
|aln5          ...  8R:42   9R:42     10A:42           11A:42     12T:42  13T:42
|aln6          ...  8R:42   9R:42     10A:42           11A:42     12T:37  13T:42
|aln7          ...  8R:42   9R:42     10A:42           11A:42     12T:42  13T:42
|aln8          ...  8R:42   9R:42     10A:42           11A:42     12T:42  13T:42
|aln9          ...  8R:32   9R:32     10M:1            11M:1      12M:1   13M:1
|aln10         ...  8R:37   9R:22     10M:1            11M:1      12M:1   13M:1
|aln11         ...  8R:37   9R:32     10M:1            11M:1      12M:1   13M:1
|aln12         ...  8R:27   9R:42     10M:1            11M:1      12M:1   13M:1
|aln13         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln14         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln15         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln16         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln17         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln18         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln19         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln20         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln21         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln22         ...  8R:37   9R:32     10M:1            11M:1      12M:1   13M:1
|aln23         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln24         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln25         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln26         ...  8R:37   9R:27     10M:1            11M:1      12M:1   13M:1
|aln27         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln28         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln29         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln30         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln31         ...  8R:32   9R:27     10M:1            11M:1      12M:1   13M:1
|aln32         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln33         ...  8R:42   9R:42     10M:1            11M:1      12M:1   13M:1
|aln34         ...  8R:42   9R:27     10M:1            11M:1      12M:1   13M:1
|aln35         ...  8M:1    9M:1      10M:1            11M:1      12M:1   13M:1
|aln36         ...  8M:1    9R:32     10M:1            11M:1      12M:1   13M:1
|aln37         ...  8M:1    9R:11     10M:1            11M:1      12M:1   13M:1
ekg commented 8 years ago

Before:

1_3675803_C_CAATTATTTAAA                                                          
reference   CACAAATTC                                                             
hap         .........                                                             
hap         .........                                                             
geno        .........                                                             
geno        .........                                                             
SodqFxYPZI  .........AATTATT 1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2202:9749:37753  
SodqFxYPZI  .........AATTATT 1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2216:7110:46578  
SodqfXYPZI  .........AATTATT 1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2224:2969:63842  
sOdqFxYPZI  .........AATTATT 1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2204:8348:36417  
sOdqFxYPZI  .........AATTATT 1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1211:11891:17236 
SodqfXYPZI  .........AATTATT 1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2115:5943:15232  
SodqFxYPZI  .........AATTATT 1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2111:32475:3032  
SodqFxYPZI  .........AATTATT 1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2115:23553:44644 
SodqFxYPZI  .........AATTATT 1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1204:18366:71700 
SodqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2110:10439:38315 
sOdqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2108:32211:52327 
SodqfXYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1211:15839:43273 
SodqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2123:29237:25728 
SodqfXYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2221:21645:44644 
SodqfXYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2115:6531:41462  
sOdqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2209:10226:33692 
sOdqfXYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1211:17757:2575  
sOdqfXYPZI  A........        0.9 0.9 60 ST-E00185:49:H5LVWCCXX:5:2109:3801:2346   
SOdqFxYPZi  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1201:26466:51553 
SodqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2222:32231:48213 
SODqFxyPZi  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2201:26131:48934 
SoDqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2108:13413:29191 
SoDqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1108:12215:30105 
SodqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1108:12226:30088 
SodqfXYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1205:3131:5651   
sOdqfXYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1106:12205:20840 
SodqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1219:3913:29156  
SodqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1102:26963:17500 
SoDqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1108:29521:38421 
sOdqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:1218:3009:65776  
SodqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2217:28851:9800  
SodqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2106:5334:14283  
sOdqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2113:11738:53645 
sODqFxYPZI  .........        1.0 1.0 60 ST-E00185:49:H5LVWCCXX:5:2113:11353:54032 
SodqfXYPZI   ........        0.9 0.9 60 ST-E00185:49:H5LVWCCXX:5:2122:18752:23038 
sOdqFxYPZI  ...              0.3 0.3 60 ST-E00185:49:H5LVWCCXX:5:1121:14601:36452 
sOdqFxYPZI          .        0.1 0.1 60 ST-E00185:49:H5LVWCCXX:5:2122:5719:41814  
sOdqfXYPZI          .        0.1 0.1 60 ST-E00185:49:H5LVWCCXX:5:2221:19442:38087 
QUAL:8672.0                                                                       

Fixed:

1_3675803_C_CAATTATTTAAA
reference   CACAAATTC-------
hap         .........-------
hap         .........AATTATT
geno        .........-------
geno        .........AATTATT
SodqfXYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1211:15839:43273
SodqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2123:29237:25728
SodqfXYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2221:21645:44644
SodqfXYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2115:6531:41462
sOdqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2209:10226:33692
sOdqfXYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1211:17757:2575
SodqFxYPZI  .........AATTATT 0.6 1.0 60 ST-E00185:49:H5LVWCCXX:5:2202:9749:37753
SodqFxYPZI  .........AATTATT 0.6 1.0 60 ST-E00185:49:H5LVWCCXX:5:2216:7110:46578
sOdqfXYPZI  A........------- 0.9 0.5 60 ST-E00185:49:H5LVWCCXX:5:2109:3801:2346
SOdqFxYPZi  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1201:26466:51553
SodqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2222:32231:48213
SODqFxyPZi  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2201:26131:48934
SoDqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2108:13413:29191
SoDqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1108:12215:30105
SodqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1108:12226:30088
SodqfXYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1205:3131:5651
SodqfXYPZI  .........AATTATT 0.6 1.0 60 ST-E00185:49:H5LVWCCXX:5:2224:2969:63842
sOdqFxYPZI  .........AATTATT 0.6 1.0 60 ST-E00185:49:H5LVWCCXX:5:2204:8348:36417
sOdqfXYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1106:12205:20840
SodqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1219:3913:29156
sOdqFxYPZI  .........AATTATT 0.6 1.0 60 ST-E00185:49:H5LVWCCXX:5:1211:11891:17236
SodqfXYPZI  .........AATTATT 0.6 1.0 60 ST-E00185:49:H5LVWCCXX:5:2115:5943:15232
SodqFxYPZI  .........AATTATT 0.6 1.0 60 ST-E00185:49:H5LVWCCXX:5:2111:32475:3032
SodqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1102:26963:17500
SoDqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1108:29521:38421
sOdqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:1218:3009:65776
SodqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2106:5334:14283
SodqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2217:28851:9800
SodqFxYPZI  .........AATTATT 0.6 1.0 60 ST-E00185:49:H5LVWCCXX:5:2115:23553:44644
sOdqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2113:11738:53645
sODqFxYPZI  .........------- 1.0 0.6 60 ST-E00185:49:H5LVWCCXX:5:2113:11353:54032
SodqFxYPZI  .........AATTATT 0.6 1.0 60 ST-E00185:49:H5LVWCCXX:5:1204:18366:71700
SodqfXYPZI   ........------- 0.9 0.5 60 ST-E00185:49:H5LVWCCXX:5:2122:18752:23038
SodqFxYPZI  .........        0.6 0.6 60 ST-E00185:49:H5LVWCCXX:5:2110:10439:38315
sOdqFxYPZI  .........        0.6 0.6 60 ST-E00185:49:H5LVWCCXX:5:2108:32211:52327
sOdqFxYPZI          .------- 0.5 0.1 60 ST-E00185:49:H5LVWCCXX:5:2122:5719:41814
sOdqfXYPZI          .------- 0.5 0.1 60 ST-E00185:49:H5LVWCCXX:5:2221:19442:38087
sOdqFxYPZI  ...              0.2 0.2 60 ST-E00185:49:H5LVWCCXX:5:1121:14601:36452
sOdqFxYPZI                   0.0 0.0 60 ST-E00185:49:H5LVWCCXX:5:1113:26720:46859
sODqFxYPZI                   0.0 0.0 60 ST-E00185:49:H5LVWCCXX:5:1113:26568:46982
sODqFxYPZI                   0.0 0.0 60 ST-E00185:49:H5LVWCCXX:5:2218:19158:69802
sOdqFxYPZI                   0.0 0.0 60 ST-E00185:49:H5LVWCCXX:5:2218:19219:69802
sOdqFxYPZI                   0.0 0.0 60 ST-E00185:49:H5LVWCCXX:5:1111:6410:24444
QUAL:8672.0 
ekg commented 8 years ago

Updated further, so we don't have to do the --asume-ref trick to get the gap bases in the haps from the VCF.

1_3675803_C_CAATTATTTAAA
reference   CACAAATTC-------
hap                 .-------
hap                 .AATTATT
geno                .-------
geno                .AATTATT
SodqfXYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1211:15839:43273
SodqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2123:29237:25728
SodqfXYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2221:21645:44644
SodqfXYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2115:6531:41462
sOdqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2209:10226:33692
sOdqfXYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1211:17757:2575
SodqFxYPZI  .........AATTATT 0.1 1.0 60 ST-E00185:49:H5LVWCCXX:5:2202:9749:37753
SodqFxYPZI  .........AATTATT 0.1 1.0 60 ST-E00185:49:H5LVWCCXX:5:2216:7110:46578
sOdqfXYPZI  A........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2109:3801:2346
SOdqFxYPZi  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1201:26466:51553
SodqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2222:32231:48213
SODqFxyPZi  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2201:26131:48934
SoDqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2108:13413:29191
SoDqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1108:12215:30105
SodqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1108:12226:30088
SodqfXYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1205:3131:5651
SodqfXYPZI  .........AATTATT 0.1 1.0 60 ST-E00185:49:H5LVWCCXX:5:2224:2969:63842
sOdqFxYPZI  .........AATTATT 0.1 1.0 60 ST-E00185:49:H5LVWCCXX:5:2204:8348:36417
sOdqfXYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1106:12205:20840
SodqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1219:3913:29156
sOdqFxYPZI  .........AATTATT 0.1 1.0 60 ST-E00185:49:H5LVWCCXX:5:1211:11891:17236
SodqfXYPZI  .........AATTATT 0.1 1.0 60 ST-E00185:49:H5LVWCCXX:5:2115:5943:15232
SodqFxYPZI  .........AATTATT 0.1 1.0 60 ST-E00185:49:H5LVWCCXX:5:2111:32475:3032
SodqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1102:26963:17500
SoDqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1108:29521:38421
sOdqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:1218:3009:65776
SodqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2106:5334:14283
SodqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2217:28851:9800
SodqFxYPZI  .........AATTATT 0.1 1.0 60 ST-E00185:49:H5LVWCCXX:5:2115:23553:44644
sOdqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2113:11738:53645
sODqFxYPZI  .........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2113:11353:54032
SodqFxYPZI  .........AATTATT 0.1 1.0 60 ST-E00185:49:H5LVWCCXX:5:1204:18366:71700
SodqfXYPZI   ........------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2122:18752:23038
SodqFxYPZI  .........        0.1 0.1 60 ST-E00185:49:H5LVWCCXX:5:2110:10439:38315
sOdqFxYPZI  .........        0.1 0.1 60 ST-E00185:49:H5LVWCCXX:5:2108:32211:52327
sOdqFxYPZI          .------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2122:5719:41814
sOdqfXYPZI          .------- 1.0 0.1 60 ST-E00185:49:H5LVWCCXX:5:2221:19442:38087
sOdqFxYPZI  ...              0.0 0.0 60 ST-E00185:49:H5LVWCCXX:5:1121:14601:36452
QUAL:8672.0