Illumina / paragraph

Graph realignment tools for structural variants
Other
150 stars 28 forks source link

Format error in vcf line: #81

Open mrwangyz opened 1 year ago

mrwangyz commented 1 year ago

Thank you for developing this software, it is very helpful to me. But I encountered a problem while using it. It seems that there is a problem with my file format. But based on looking at your source code, I found that this file was generated by grmpy. This confused me. After checking your source code, I still can't found problem. The following is my error message: [E::idx_find_and_load] Could not retrieve index file for 'paragraph_inv/variants.vcf.gz' 2023-08-30 20:36:48,691 ERROR Traceback (most recent call last): 2023-08-30 20:36:48,691 ERROR File "/public2/wangyz/bin/paragraph-v2.4a-binary/lib/python3/grm/vcfgraph/vcfupdate.py", line 161, in update_vcf_from_grmpy record = header.new_record(contig=raw_record.chrom, start=raw_record.start, stop=raw_record.stop, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2023-08-30 20:36:48,692 ERROR File "pysam/libcbcf.pyx", line 2101, in pysam.libcbcf.VariantHeader.new_record 2023-08-30 20:36:48,692 ERROR File "pysam/libcbcf.pyx", line 3247, in pysam.libcbcf.VariantRecord.alleles.set 2023-08-30 20:36:48,692 ERROR ValueError: must set at least 2 alleles 2023-08-30 20:36:48,692 ERROR During handling of the above exception, another exception occurred: 2023-08-30 20:36:48,692 ERROR Traceback (most recent call last): 2023-08-30 20:36:48,699 ERROR File "/public2/wangyz/bin/paragraph-v2.4a-binary/bin/multigrmpy.py", line 340, in run vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names) 2023-08-30 20:36:48,699 ERROR File "/public2/wangyz/bin/paragraph-v2.4a-binary/lib/python3/grm/vcfgraph/vcfupdate.py", line 164, in update_vcf_from_grmpy raise Exception("Format error in vcf line: " + str(raw_record)) 2023-08-30 20:36:48,700 ERROR Exception: Format error in vcf line: chr1 4203 syri.INV.551237 . . . PASS SVLEN=2949;SVTYPE=INV;END=7152;GRMPY_ID=test_sort.vcf.gz@5b86c07c81908a94739dfe790e732ecf07909ff3fc7a02e1113cde7f9653acc5:1 Traceback (most recent call last): File "/public2/wangyz/bin/paragraph-v2.4a-binary/lib/python3/grm/vcfgraph/vcfupdate.py", line 161, in update_vcf_from_grmpy record = header.new_record(contig=raw_record.chrom, start=raw_record.start, stop=raw_record.stop, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pysam/libcbcf.pyx", line 2101, in pysam.libcbcf.VariantHeader.new_record File "pysam/libcbcf.pyx", line 3247, in pysam.libcbcf.VariantRecord.alleles.set ValueError: must set at least 2 alleles

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/public2/wangyz/bin/paragraph-v2.4a-binary/bin/multigrmpy.py", line 353, in main() File "/public2/wangyz/bin/paragraph-v2.4a-binary/bin/multigrmpy.py", line 349, in main run(args) File "/public2/wangyz/bin/paragraph-v2.4a-binary/bin/multigrmpy.py", line 340, in run vcfupdate.update_vcf_from_grmpy(vcf_input_path, grmpyOutput, result_vcf_path, sample_names) File "/public2/wangyz/bin/paragraph-v2.4a-binary/lib/python3/grm/vcfgraph/vcfupdate.py", line 164, in update_vcf_from_grmpy raise Exception("Format error in vcf line: " + str(raw_record)) Exception: Format error in vcf line: chr1 4203 syri.INV.551237 . . . PASS SVLEN=2949;SVTYPE=INV;END=7152;GRMPY_ID=test_sort.vcf.gz@5b86c07c81908a94739dfe790e732ecf07909ff3fc7a02e1113cde7f9653acc5:1

yangxin-9 commented 11 months ago

Hi, I am wondering how did you fix this error? I am also getting a similar error. image

mrwangyz commented 11 months ago

Hi, I am wondering how did you fix this error? I am also getting a similar error. image

Hello, glad to help you. I looked through the program that was throwing the error and determined where the problem was. There are many reasons why this error is thrown, but they are basically vcf format issues. The author's program is written in python, so it is not very difficult to read the source code and find errors.

yangxin-9 commented 11 months ago

Thank you for your assistance and it looks like the vcf is incorrectly formatted. It was truly essential. I will read the program code and modify the vcf format.