arq5x / bedtools2

bedtools - the swiss army knife for genome arithmetic
MIT License
939 stars 287 forks source link

Error: Invalid record in file #1098

Open r-poloni opened 4 months ago

r-poloni commented 4 months ago

Hi, I am trying to produce a bedfile containing all uncalled regions of the vcf, using bedtools version 2.27.1. However, the same error pops out with the newest 2.31.1. The command is:

bedtools complement -i vcf_file-vcf.gz -g genome_index.fai > uncalled.bed

And I am getting always the same error: Error: Invalid record in file vcf_file.vcf.gz. Record is

I tried to reformat the vcf with plink using

plink --vcf vcf_file.vcf.gz --double-id --allow-extra-chr --keep-allele-order --recode vcf-iid --out vcf_file_recoded

And I get exactly the same error. With plink output it is saying that the offending line is: ptg000001l 78925 . T . . PR GT 0/00/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/00/0 0/0 0/0 ./. 0/0 0/0 0/0 0/0 0/0 ./. 0/0 ./../. ./. 0/0 0/0 0/0 0/0 ./. 0/0 ./. ./. ./. 0/00/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/00/0 ./. ./. 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/00/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/00/0 ./. 0/0 0/0 0/0 0/0

As a last resource, I tried to remove the "" tag in the ALT field which was the only weird thing I could spot and replace it with a "T" for tyrosine, and the error disappeared.

Since this error is popping out with vcf files produced by probably two of the most widely used variant callers, like bcftools and GATK, I think that it would be a huge benefit to fix this issue, if it is a problem with bedtools. If it is an error on my side, I would be very happy to know any solution.

Thank you in advance, Riccardo