Closed cyrusmallon closed 1 year ago
you did a lot of work. I think the problem is a stray blank character at the end of the #CHROM line. Once I remove that readVcf works.
thanks for the thorough report.
How did I find it out? I ran the example of readVcf, then did a readLines on the example file and on your excerpt of the problematic file. I noticed that the #CHROM lines differed as noted. A sad lack of resilience in the parsing infrastructure.
Thank you so much for your quick reply and solution! I didn't know about the readLines() function, but I just ran it and see the same space you did! After taking the space away vcf now loads.
I will close the issue now.
Thanks again!
you did a lot of work. I think the problem is a stray blank character at the end of the #CHROM line. Once I remove that readVcf works.
Hi, could you clarify what you mean by a stray blank character at the end of the #CHROM line? I'm having my own problems with bcftools not parsing header and think this might be my solution.
Hi,
If you're working with R you can use the readLines()
command to look at all input lines of your vcf file, line by line (same thing as row by row). Your output should be something like this:
#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER
Notice that this is a tab separated file and there are no spaces anywhere, only the \t denoting the tab separation.
If you happen to see something like this:
#CHROM\tPOS\tID\tREF\tALT\tQUAL\t FILTER
Where there is a weird space somewhere, then it may be difficult to parse your vcf file. Also take note of the error messages. If I remember correctly, if there is an "INFO" column, then it must be proceeded by some columns with metadata (something to that effect). Also, if you look up the current vcf guidelines, the formats of vcf files are standardized. So perhaps your vcf file is somehow different than the standard format and for that reason it cannot be parsed/loaded.
I see you're working in python. In python you can use readlines()
Hello,
I've am trying to load a vcf file that was produced using breseq (v 0.37.1), but it fails to load with the function
readVcf()
.Here is the beginning of the vcf file that breseq produces (full file as .txt here original.txt) , which I have tried to load unsuccessfuly with
readVcf()
:This is the error message I receive:
After receiving this error message, I simply used awk to create a new FORMAT column and populated that column with the string 'GT' for genotype. I also made sure everything was tab separated. Here is what the beginning of the vcf file looks like (full file as .txt here newfile1.txt):
But again I get the same error:
Then I saw this post, https://github.com/gerstung-lab/deepSNV/issues/17, which says that the FORMAT column needs to be followed by another colum. Therefore, I simply added a column I just called "EXTRA" and populated it with the text "COL". Here's what the beginning of the vcf looks like (full file as .txt here newfile2.txt):
But again I get the same error:
Does anyone know why I can't upload these vcf files with
readVcf()
? Does something stick out in the format of the file? Or is there possibly some bug withreadVcf()
?Thank you.