eldariont / svim

Structural Variant Identification Method using Long Reads
GNU General Public License v3.0
152 stars 19 forks source link

(Almost) no insertions #36

Closed LizzieMcDizzie closed 4 years ago

LizzieMcDizzie commented 4 years ago

I am using SVIM to call SV in a mammalian diploid genome with pacbio reads at around 60X. The genome was assembled with the same data. Because I have good depth I am using a quality cut-off of of 10, and calling SV > 10bp. This identifies 44K deletions and only 87 insertions with v1.3.

However, if I use an older version (1.1) with the same data it detects 46K deletions and 65K insertions.

I inspected a few INS from version 1.1 and they look correct to me.

I was wondering if this is an issue with the changes make to v1.3.1?

v1.3.1 command used: svim alignment --min_sv_size 10 --max_sv_size 1000000 --insertion_sequences --sequence_alleles --interspersed_duplications_as_insertions --minimum_score 8 PB_to_BrahGenome20200102_N_1_D PB_to_BrahGenome20200102_N_1.default.bam BrahGenome20200102_N.fasta

v1.1 command used: svim alignment --min_sv_size 10 --max_sv_size 1000000 --minimum_score 8 PB_to_BrahGenome20200102_N_1_D_svim1.1 PB_to_BrahGenome20200102_N_1.default.bam

Thanks.

eldariont commented 4 years ago

Hi, thanks for reporting this issue. That sounds really strange indeed. There is apparently something going wrong with v1.3.1 but I have never experienced something like this before.

Could you please post the content of the log files in your two working directories? You can skip the large number of progress lines in the middle but everything else would help me to investigate the issue.

Cheers David

LizzieMcDizzie commented 4 years ago

Apologies for the LOOOOONG post below. I am rerunning with 1.3.0 - but it's late down here so I wont be able to post the results for around 12 hrs.
If it helps I am happy to send the vcf files. Cheers.

v1.3.1: grep -v "Processed" PB_to_BrahGenome20200102_N_1_D/SVIM_200514_090210.log 2020-05-14 09:02:10,601 [INFO ] ** Start SVIM, version 1.3.1 ** 2020-05-14 09:02:10,601 [INFO ] CMD: python3 /home/uqeross2/.conda/envs/conda-env-local-bin-install/bin/svim alignment --min_sv_size 10 --max_sv_size 1000000 --insertion_sequences --sequence_alleles --interspersed_duplications_as_insertions --minimum_score 8 PB_to_BrahGenome20200102_N_1_D PB_to_BrahGenome20200102_N_1.default.bam BrahGenome20200102_N.fasta 2020-05-14 09:02:10,601 [INFO ] WORKING DIR: /QRISdata/Q0275/ElizabethRoss/OtherGenomes/GraphGenome/PB_to_BrahGenome20200102_N_1_D 2020-05-14 09:02:10,601 [INFO ] PARAMETER: sub, VALUE: alignment 2020-05-14 09:02:10,601 [INFO ] PARAMETER: working_dir, VALUE: /QRISdata/Q0275/ElizabethRoss/OtherGenomes/GraphGenome/PB_to_BrahGenome20200102_N_1_D 2020-05-14 09:02:10,601 [INFO ] PARAMETER: bam_file, VALUE: PB_to_BrahGenome20200102_N_1.default.bam 2020-05-14 09:02:10,602 [INFO ] PARAMETER: genome, VALUE: BrahGenome20200102_N.fasta 2020-05-14 09:02:10,602 [INFO ] PARAMETER: min_mapq, VALUE: 20 2020-05-14 09:02:10,602 [INFO ] PARAMETER: min_sv_size, VALUE: 10 2020-05-14 09:02:10,602 [INFO ] PARAMETER: max_sv_size, VALUE: 1000000 2020-05-14 09:02:10,602 [INFO ] PARAMETER: segment_gap_tolerance, VALUE: 10 2020-05-14 09:02:10,602 [INFO ] PARAMETER: segment_overlap_tolerance, VALUE: 5 2020-05-14 09:02:10,602 [INFO ] PARAMETER: partition_max_distance, VALUE: 5000 2020-05-14 09:02:10,602 [INFO ] PARAMETER: distance_normalizer, VALUE: 900 2020-05-14 09:02:10,602 [INFO ] PARAMETER: cluster_max_distance, VALUE: 0.3 2020-05-14 09:02:10,602 [INFO ] PARAMETER: del_ins_dup_max_distance, VALUE: 1.0 2020-05-14 09:02:10,602 [INFO ] PARAMETER: trans_destination_partition_max_distance, VALUE: 1000 2020-05-14 09:02:10,602 [INFO ] PARAMETER: trans_partition_max_distance, VALUE: 200 2020-05-14 09:02:10,602 [INFO ] PARAMETER: trans_sv_max_distance, VALUE: 500 2020-05-14 09:02:10,602 [INFO ] PARAMETER: skip_genotyping, VALUE: False 2020-05-14 09:02:10,602 [INFO ] PARAMETER: minimum_score, VALUE: 8 2020-05-14 09:02:10,602 [INFO ] PARAMETER: homozygous_threshold, VALUE: 0.8 2020-05-14 09:02:10,603 [INFO ] PARAMETER: heterozygous_threshold, VALUE: 0.2 2020-05-14 09:02:10,603 [INFO ] PARAMETER: minimum_depth, VALUE: 4 2020-05-14 09:02:10,603 [INFO ] PARAMETER: sample, VALUE: Sample 2020-05-14 09:02:10,603 [INFO ] PARAMETER: types, VALUE: DEL,INS,INV,DUP:TANDEM,DUP:INT,BND 2020-05-14 09:02:10,603 [INFO ] PARAMETER: sequence_alleles, VALUE: True 2020-05-14 09:02:10,603 [INFO ] PARAMETER: insertion_sequences, VALUE: True 2020-05-14 09:02:10,603 [INFO ] PARAMETER: tandem_duplications_as_insertions, VALUE: False 2020-05-14 09:02:10,603 [INFO ] PARAMETER: interspersed_duplications_as_insertions, VALUE: True 2020-05-14 09:02:10,603 [INFO ] PARAMETER: read_names, VALUE: False 2020-05-14 09:02:10,603 [INFO ] PARAMETER: zmws, VALUE: False 2020-05-14 09:02:10,603 [INFO ] ** STEP 1: COLLECT ** 2020-05-14 09:02:10,603 [INFO ] MODE: alignment 2020-05-14 09:02:10,603 [INFO ] INPUT: /QRISdata/Q0275/ElizabethRoss/OtherGenomes/GraphGenome/PB_to_BrahGenome20200102_N_1.default.bam 2020-05-14 09:56:19,840 [INFO ] Found 1413181 signatures for deleted regions. 2020-05-14 09:56:19,840 [INFO ] Found 12714631 signatures for inserted regions. 2020-05-14 09:56:19,840 [INFO ] Found 7373 signatures for inverted regions. 2020-05-14 09:56:19,840 [INFO ] Found 23886 signatures for tandem duplicated regions. 2020-05-14 09:56:19,840 [INFO ] Found 18421 signatures for translocation breakpoints. 2020-05-14 09:56:19,840 [INFO ] Found 228 signatures for inserted regions with detected region of origin. 2020-05-14 09:56:19,840 [INFO ] ** STEP 2: CLUSTER ** 2020-05-14 09:57:42,444 [INFO ] Clustered deleted regions: 73284 partitions and 131130 clusters 2020-05-14 09:58:41,330 [INFO ] Clustered inserted regions: 698 partitions and 49203 clusters 2020-05-14 09:58:43,066 [INFO ] Clustered inverted regions: 472 partitions and 553 clusters 2020-05-14 09:58:43,874 [INFO ] Clustered tandem duplicated regions: 1730 partitions and 2601 clusters 2020-05-14 09:58:44,084 [INFO ] Clustered inserted regions with detected region of origin: 150 partitions and 152 clusters 2020-05-14 09:58:44,279 [INFO ] Finished clustering. Writing signature clusters.. 2020-05-14 09:58:49,177 [INFO ] ** STEP 3: COMBINE ** 2020-05-14 09:58:49,192 [INFO ] Cluster translocation breakpoints.. 2020-05-14 09:58:50,383 [INFO ] Combine inserted regions with translocation breakpoints.. 2020-05-14 09:58:50,504 [INFO ] Create interspersed duplication candidates and flag cut&paste insertions.. 2020-05-14 09:59:46,963 [INFO ] Cluster interspersed duplication candidates one more time.. 2020-05-14 09:59:46,966 [INFO ] Clustered interspersed duplication candidates: 146 partitions and 152 clusters 2020-05-14 09:59:46,972 [INFO ] ** STEP 4: GENOTYPE ** 2020-05-14 09:59:46,972 [INFO ] Genotyping deletions.. 2020-05-14 10:05:16,327 [INFO ] Genotyping inversions.. 2020-05-14 10:05:16,888 [INFO ] Genotyping novel insertions.. 2020-05-14 10:05:19,092 [INFO ] Genotyping interspersed duplications.. 2020-05-14 10:05:19,097 [INFO ] Write SV candidates.. 2020-05-14 10:05:19,097 [INFO ] Final deletion candidates: 131130 2020-05-14 10:05:19,097 [INFO ] Final inversion candidates: 553 2020-05-14 10:05:19,097 [INFO ] Final interspersed duplication candidates: 152 2020-05-14 10:05:19,097 [INFO ] Final tandem duplication candidates: 2601 2020-05-14 10:05:19,097 [INFO ] Final novel insertion candidates: 49066 2020-05-14 10:05:19,097 [INFO ] Final breakend candidates: 17429 2020-05-14 10:05:45,690 [INFO ] Draw plots.. 2020-05-14 10:05:55,251 [INFO ] Done.

v1.1 grep -v "Processed" PB_to_BrahGenome20200102_N_1_D_svim1.1/SVIM_200527_110921.log 2020-05-27 11:09:21,738 [INFO ] ** Start SVIM, version 1.1.0 ** 2020-05-27 11:09:21,739 [INFO ] CMD: python3 /30days/uqeross2/condaenv_python3.6/bin/svim alignment --min_sv_size 10 --max_sv_size 1000000 --minimum_score 8 PB_to_BrahGenome20200102_N_1_D_svim1.1 PB_to_BrahGenome20200102_N_1.default.bam 2020-05-27 11:09:21,739 [INFO ] WORKING DIR: /QRISdata/Q0275/ElizabethRoss/OtherGenomes/GraphGenome/PB_to_BrahGenome20200102_N_1_D_svim1.1 2020-05-27 11:09:21,739 [INFO ] PARAMETER: sub, VALUE: alignment 2020-05-27 11:09:21,739 [INFO ] PARAMETER: working_dir, VALUE: /QRISdata/Q0275/ElizabethRoss/OtherGenomes/GraphGenome/PB_to_BrahGenome20200102_N_1_D_svim1.1 2020-05-27 11:09:21,739 [INFO ] PARAMETER: bam_file, VALUE: PB_to_BrahGenome20200102_N_1.default.bam 2020-05-27 11:09:21,739 [INFO ] PARAMETER: min_mapq, VALUE: 20 2020-05-27 11:09:21,739 [INFO ] PARAMETER: min_sv_size, VALUE: 10 2020-05-27 11:09:21,739 [INFO ] PARAMETER: max_sv_size, VALUE: 1000000 2020-05-27 11:09:21,739 [INFO ] PARAMETER: segment_gap_tolerance, VALUE: 10 2020-05-27 11:09:21,739 [INFO ] PARAMETER: segment_overlap_tolerance, VALUE: 5 2020-05-27 11:09:21,739 [INFO ] PARAMETER: partition_max_distance, VALUE: 5000 2020-05-27 11:09:21,739 [INFO ] PARAMETER: distance_normalizer, VALUE: 900 2020-05-27 11:09:21,739 [INFO ] PARAMETER: cluster_max_distance, VALUE: 0.3 2020-05-27 11:09:21,740 [INFO ] PARAMETER: del_ins_dup_max_distance, VALUE: 1.0 2020-05-27 11:09:21,740 [INFO ] PARAMETER: trans_destination_partition_max_distance, VALUE: 1000 2020-05-27 11:09:21,740 [INFO ] PARAMETER: trans_partition_max_distance, VALUE: 200 2020-05-27 11:09:21,740 [INFO ] PARAMETER: trans_sv_max_distance, VALUE: 500 2020-05-27 11:09:21,740 [INFO ] PARAMETER: skip_genotyping, VALUE: False 2020-05-27 11:09:21,740 [INFO ] PARAMETER: minimum_score, VALUE: 8 2020-05-27 11:09:21,740 [INFO ] PARAMETER: homozygous_threshold, VALUE: 0.8 2020-05-27 11:09:21,740 [INFO ] PARAMETER: heterozygous_threshold, VALUE: 0.2 2020-05-27 11:09:21,740 [INFO ] PARAMETER: minimum_depth, VALUE: 4 2020-05-27 11:09:21,740 [INFO ] PARAMETER: sample, VALUE: Sample 2020-05-27 11:09:21,740 [INFO ] PARAMETER: types, VALUE: DEL,INS,INV,DUP_TAN,DUP_INT,BND 2020-05-27 11:09:21,740 [INFO ] PARAMETER: duplications_as_insertions, VALUE: False 2020-05-27 11:09:21,740 [INFO ] ** STEP 1: COLLECT ** 2020-05-27 11:09:21,740 [INFO ] MODE: alignment 2020-05-27 11:09:21,740 [INFO ] INPUT: /QRISdata/Q0275/ElizabethRoss/OtherGenomes/GraphGenome/PB_to_BrahGenome20200102_N_1.default.bam 2020-05-27 12:00:04,842 [INFO ] Found 1412657 signatures for deleted regions. 2020-05-27 12:00:04,843 [INFO ] Found 12714555 signatures for inserted regions. 2020-05-27 12:00:04,843 [INFO ] Found 7298 signatures for inverted regions. 2020-05-27 12:00:04,843 [INFO ] Found 21797 signatures for tandem duplicated regions. 2020-05-27 12:00:04,843 [INFO ] Found 9315 signatures for translocation breakpoints. 2020-05-27 12:00:04,843 [INFO ] Found 213 signatures for inserted regions with detected region of origin. 2020-05-27 12:00:04,843 [INFO ] ** STEP 2: CLUSTER ** 2020-05-27 12:01:52,098 [INFO ] Clustered deleted regions: 103155 partitions and 135324 clusters 2020-05-27 12:17:27,516 [INFO ] Clustered inserted regions: 3265827 partitions and 6652257 clusters 2020-05-27 12:25:26,336 [INFO ] Clustered inverted regions: 537 partitions and 547 clusters 2020-05-27 12:25:27,870 [INFO ] Clustered tandem duplicated regions: 2197 partitions and 2500 clusters 2020-05-27 12:25:28,143 [INFO ] Clustered inserted regions with detected region of origin: 139 partitions and 142 clusters 2020-05-27 12:25:28,360 [INFO ] Finished clustering. Writing signature clusters.. 2020-05-27 12:27:07,935 [INFO ] ** STEP 3: COMBINE ** 2020-05-27 12:27:07,956 [INFO ] Cluster translocation breakpoints.. 2020-05-27 12:27:08,706 [INFO ] Combine inserted regions with translocation breakpoints.. 2020-05-27 12:27:23,264 [INFO ] Create interspersed duplication candidates and flag cut&paste insertions.. 2020-05-27 12:29:48,557 [INFO ] Cluster interspersed duplication candidates one more time.. 2020-05-27 12:29:48,573 [INFO ] Clustered interspersed duplication candidates: 160 partitions and 166 clusters 2020-05-27 12:29:48,583 [INFO ] ** STEP 4: GENOTYPE ** 2020-05-27 12:29:48,583 [INFO ] Genotyping deletions.. 2020-05-27 12:38:29,663 [INFO ] Genotyping inversions.. 2020-05-27 12:38:30,372 [INFO ] Genotyping novel insertions.. 2020-05-27 12:48:49,531 [INFO ] Genotyping interspersed duplications.. 2020-05-27 12:48:50,064 [INFO ] Write SV candidates.. 2020-05-27 12:48:50,065 [INFO ] Final deletion candidates: 135324 2020-05-27 12:48:50,065 [INFO ] Final inversion candidates: 547 2020-05-27 12:48:50,065 [INFO ] Final interspersed duplication candidates: 166 2020-05-27 12:48:50,065 [INFO ] Final tandem duplication candidates: 2500 2020-05-27 12:48:50,065 [INFO ] Final novel insertion candidates: 6650246 2020-05-27 12:48:50,065 [INFO ] Final breakend candidates: 8723 2020-05-27 12:52:08,086 [INFO ] Draw plots.. 2020-05-27 12:52:32,333 [INFO ] Done.

v1.2 2020-05-27 14:41:25,625 [INFO ] ** Start SVIM, version 1.2.0 ** 2020-05-27 14:41:25,626 [INFO ] CMD: python3 /30days/uqeross2/condaenv_python3.6/bin/svim alignment --min_sv_size 10 --max_sv_size 1000000 --minimum_score 8 PB_to_BrahGenome20200102_N_1_D_svim1.2 PB_to_BrahGenome20200102_N_1.default.bam BrahGenome20200102_N.fasta 2020-05-27 14:41:25,626 [INFO ] WORKING DIR: /QRISdata/Q0275/ElizabethRoss/OtherGenomes/GraphGenome/PB_to_BrahGenome20200102_N_1_D_svim1.2 2020-05-27 14:41:25,626 [INFO ] PARAMETER: sub, VALUE: alignment 2020-05-27 14:41:25,626 [INFO ] PARAMETER: working_dir, VALUE: /QRISdata/Q0275/ElizabethRoss/OtherGenomes/GraphGenome/PB_to_BrahGenome20200102_N_1_D_svim1.2 2020-05-27 14:41:25,626 [INFO ] PARAMETER: bam_file, VALUE: PB_to_BrahGenome20200102_N_1.default.bam 2020-05-27 14:41:25,626 [INFO ] PARAMETER: genome, VALUE: BrahGenome20200102_N.fasta 2020-05-27 14:41:25,626 [INFO ] PARAMETER: min_mapq, VALUE: 20 2020-05-27 14:41:25,626 [INFO ] PARAMETER: min_sv_size, VALUE: 10 2020-05-27 14:41:25,626 [INFO ] PARAMETER: max_sv_size, VALUE: 1000000 2020-05-27 14:41:25,626 [INFO ] PARAMETER: segment_gap_tolerance, VALUE: 10 2020-05-27 14:41:25,626 [INFO ] PARAMETER: segment_overlap_tolerance, VALUE: 5 2020-05-27 14:41:25,626 [INFO ] PARAMETER: partition_max_distance, VALUE: 5000 2020-05-27 14:41:25,626 [INFO ] PARAMETER: distance_normalizer, VALUE: 900 2020-05-27 14:41:25,627 [INFO ] PARAMETER: cluster_max_distance, VALUE: 0.3 2020-05-27 14:41:25,627 [INFO ] PARAMETER: del_ins_dup_max_distance, VALUE: 1.0 2020-05-27 14:41:25,627 [INFO ] PARAMETER: trans_destination_partition_max_distance, VALUE: 1000 2020-05-27 14:41:25,627 [INFO ] PARAMETER: trans_partition_max_distance, VALUE: 200 2020-05-27 14:41:25,627 [INFO ] PARAMETER: trans_sv_max_distance, VALUE: 500 2020-05-27 14:41:25,627 [INFO ] PARAMETER: skip_genotyping, VALUE: False 2020-05-27 14:41:25,627 [INFO ] PARAMETER: minimum_score, VALUE: 8 2020-05-27 14:41:25,627 [INFO ] PARAMETER: homozygous_threshold, VALUE: 0.8 2020-05-27 14:41:25,627 [INFO ] PARAMETER: heterozygous_threshold, VALUE: 0.2 2020-05-27 14:41:25,627 [INFO ] PARAMETER: minimum_depth, VALUE: 4 2020-05-27 14:41:25,627 [INFO ] PARAMETER: sample, VALUE: Sample 2020-05-27 14:41:25,627 [INFO ] PARAMETER: types, VALUE: DEL,INS,INV,DUP_TAN,DUP_INT,BND 2020-05-27 14:41:25,627 [INFO ] PARAMETER: sequence_alleles, VALUE: False 2020-05-27 14:41:25,627 [INFO ] PARAMETER: insertion_sequences, VALUE: False 2020-05-27 14:41:25,627 [INFO ] PARAMETER: duplications_as_insertions, VALUE: False 2020-05-27 14:41:25,627 [INFO ] PARAMETER: read_names, VALUE: False 2020-05-27 14:41:25,627 [INFO ] ** STEP 1: COLLECT ** 2020-05-27 14:41:25,628 [INFO ] MODE: alignment 2020-05-27 14:41:25,628 [INFO ] INPUT: /QRISdata/Q0275/ElizabethRoss/OtherGenomes/GraphGenome/PB_to_BrahGenome20200102_N_1.default.bam 2020-05-27 15:42:52,842 [INFO ] Found 1412657 signatures for deleted regions. 2020-05-27 15:42:52,842 [INFO ] Found 12714555 signatures for inserted regions. 2020-05-27 15:42:52,842 [INFO ] Found 7298 signatures for inverted regions. 2020-05-27 15:42:52,842 [INFO ] Found 21797 signatures for tandem duplicated regions. 2020-05-27 15:42:52,842 [INFO ] Found 9315 signatures for translocation breakpoints. 2020-05-27 15:42:52,843 [INFO ] Found 213 signatures for inserted regions with detected region of origin. 2020-05-27 15:42:52,843 [INFO ] ** STEP 2: CLUSTER ** 2020-05-27 15:44:38,931 [INFO ] Clustered deleted regions: 103155 partitions and 135389 clusters 2020-05-27 15:59:49,857 [INFO ] Clustered inserted regions: 3265827 partitions and 6652263 clusters 2020-05-27 16:07:51,101 [INFO ] Clustered inverted regions: 537 partitions and 547 clusters 2020-05-27 16:07:52,642 [INFO ] Clustered tandem duplicated regions: 2197 partitions and 2503 clusters 2020-05-27 16:07:52,924 [INFO ] Clustered inserted regions with detected region of origin: 139 partitions and 142 clusters 2020-05-27 16:07:53,137 [INFO ] Finished clustering. Writing signature clusters.. 2020-05-27 16:09:30,273 [INFO ] ** STEP 3: COMBINE ** 2020-05-27 16:09:30,295 [INFO ] Cluster translocation breakpoints.. 2020-05-27 16:09:31,015 [INFO ] Combine inserted regions with translocation breakpoints.. 2020-05-27 16:09:46,098 [INFO ] Create interspersed duplication candidates and flag cut&paste insertions.. 2020-05-27 16:12:11,015 [INFO ] Cluster interspersed duplication candidates one more time.. 2020-05-27 16:12:11,031 [INFO ] Clustered interspersed duplication candidates: 160 partitions and 166 clusters 2020-05-27 16:12:11,040 [INFO ] ** STEP 4: GENOTYPE ** 2020-05-27 16:12:11,040 [INFO ] Genotyping deletions.. 2020-05-27 16:20:25,661 [INFO ] Genotyping inversions.. 2020-05-27 16:20:26,319 [INFO ] Genotyping novel insertions.. 2020-05-27 16:30:34,023 [INFO ] Genotyping interspersed duplications.. 2020-05-27 16:30:34,627 [INFO ] Write SV candidates.. 2020-05-27 16:30:34,627 [INFO ] Final deletion candidates: 135389 2020-05-27 16:30:34,627 [INFO ] Final inversion candidates: 547 2020-05-27 16:30:34,627 [INFO ] Final interspersed duplication candidates: 166 2020-05-27 16:30:34,627 [INFO ] Final tandem duplication candidates: 2503 2020-05-27 16:30:34,627 [INFO ] Final novel insertion candidates: 6650255 2020-05-27 16:30:34,627 [INFO ] Final breakend candidates: 8723 2020-05-27 16:33:47,421 [INFO ] Draw plots.. 2020-05-27 16:34:13,800 [INFO ] Done.

---Then some filtering on QUAL--- v1.1: gawk '$6>9 && /INS/' PB_to_BrahGenome20200102_N_1_D_svim1.1/final_results.vcf | wc -l 64968 v1.2: gawk '$6>9 && /INS/' PB_to_BrahGenome20200102_N_1_D_svim1.2/final_results.vcf | wc -l 64987 v1.3.1 gawk '$6>9 && /INS/' PB_to_BrahGenome20200102_N_1_D/variants.vcf | wc -l 87 ---even with a low qual cut off of 3: gawk '$6>3 && /INS/' PB_to_BrahGenome20200102_N_1_D/variants.vcf | wc -l 494

eldariont commented 4 years ago

Thanks for sending the logs. I looked into them and found that I had actually seen this issue before. It was indeed introduced by changes to the clustering method in version 1.3.1.

Now comes the very stupid part: I already fixed the problem on branch clustering_improvements a few weeks ago but did not merge the changes into the master branch 🤦 I now finally did that and created a new release v1.4.0 that includes the fixes and other improvements.

The new release should already be available via pip install svim. There were problems with the conda release but maybe you can already try out the new release using pip.

Thanks a lot for making we aware of the issue and I hope it is fixed now. David

LizzieMcDizzie commented 4 years ago

Looks much better with v1.4 - INS and DEL and much more comparable. I installed by using the conda environment that i had 1.3.1 installed in - then running pip install svim==1.4 - that method might help some users with the dependencies.

gawk '$6>9 && /INS/' variants.vcf | wc -l 54197 gawk '$6>9 && /DEL/' variants.vcf | wc -l 45375

Thanks for the help. Cheers.

eldariont commented 4 years ago

That's very good to hear. Sorry for the inconvenience caused by this. I will close this issue now but feel free to reopen or create another issue when you experience any problems.

Cheers David