ACEnglish / truvari

Structural variant toolkit for VCFs
MIT License
308 stars 48 forks source link

Error with VCF file no sample found #176

Closed poddarharsh15 closed 9 months ago

poddarharsh15 commented 9 months ago

Hello, I am trying to run truvari bench on vcf files generated from MANTA SVcaller but unfortunately I am having errors regarding no samples found. Could you please suggest some ideas? Screenshot from 2023-11-24 12-39-29

yusufkizilarslan commented 9 months ago

It seems like the issue is related to the absence of sample information in your VCF files generated by MANTA SVcaller. Truvari expects VCF files to have sample information in the header, and it's reporting that it cannot find any samples in the specified files.

Let's address the issues one by one:

No SAMPLE columns found in VCF: The error indicates that the VCF file candidateSV.v.C.f.gz does not have any sample columns. This is a crucial piece of information for tools like Truvari that compare variants between different samples. You need to make sure that MANTA includes sample information in its VCF output.

You can check the header of your VCF file using tools like bcftools: bcftools view -h candidateSV.v.C.f.gz Look for the line starting with #CHROM in the VCF header. It should contain information about the samples.

No sample line / ValueError: cannot create VariantHeader: The second error indicates a problem with reading the VCF file, and it's likely related to the absence of a valid header or sample information. Ensure that the VCF file is correctly formatted and has the necessary header lines.

You can use bcftools to check the VCF file format: bcftools validate candidateSV.v.C.f.gz This command will provide information about the validity of the VCF file.

ValueError: file does not have a valid header: The last error indicates that Truvari is unable to recognize a valid header in the VCF file. Make sure that the file specified (candidateSV.v.C.f.gz) is indeed a valid VCF file.

Check the file by using zcat or zless to inspect its content: zcat candidateSV.v.C.f.gz | less Ensure that the file contains a valid VCF header (lines starting with ## for metadata and #CHROM for the sample information).

If MANTA SVcaller does not include sample information in its output, you might need to consult MANTA's documentation or options to include sample information in the VCF file. If the issue persists, you may need to contact the MANTA support community for assistance or consider alternative structural variant callers that provide the necessary sample information in their VCF output Kimden: @.> Gönderilme: 24 Kasım 2023 Cuma 14:40 Kime: @.> Bilgi: @.***> Konu: [ACEnglish/truvari] Error with VCF file no sample found (Issue #176)

Hello, I am trying to run truvari bench on vcf files generated from MANTA SVcaller but unfortunately I am having errors regarding no samples found. Could you please suggest some ideas? [Screenshot from 2023-11-24 12-39-29]https://user-images.githubusercontent.com/45700858/285446480-00370a52-e9c5-4c45-9242-623e5fca146b.png

— Reply to this email directly, view it on GitHubhttps://github.com/ACEnglish/truvari/issues/176, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUXB247OG4F57A2VZ2OXJX3YGCBRLAVCNFSM6AAAAAA7ZAA3LOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGAYDSNJYGQZTKMA. You are receiving this because you are subscribed to this thread.Message ID: @.***>

poddarharsh15 commented 9 months ago

candidateSV.vcf.gz Hello, thank you for answering as I have checked the .vcf file earlier it has no FORMAT & SAMPLE column present and I read some earlier issues from truvari git and added FORMAT(GT) & SAMPLE(.) in my vcf files manually but I suppose the files aren't compatible with truvari when I add these columns manually. Screenshot from 2023-11-24 13-41-29

ACEnglish commented 9 months ago

There must have been an error in how the format and sample was added. I ran the following script and was able to run on its output VCF.

for line in sys.stdin: if line.startswith("##"): sys.stdout.write(line) elif line.startswith("#"): sys.stdout.write('##FORMAT=\n') sys.stdout.write(line.strip() + '\tFORMAT\tSAMPLE\n') else: sys.stdout.write(line.strip() + '\tGT\t.\n')

- command line
```bash
gunzip -c candidateSV.vcf.gz| python fixer.py | bgzip > cand_fix.vcf.gz
poddarharsh15 commented 9 months ago

Thank you so much finally it's working for me now.