AstraZeneca-NGS / VarDictJava

VarDict Java port
MIT License
128 stars 57 forks source link

Vardict 1.8.3 outputs 37 columns in simple mode #350

Closed tgeneste closed 2 years ago

tgeneste commented 2 years ago

Hello,

I have an issue with Vardict 1.8.3 where Vardict output contains 37 columns that I don't exactly understand, and thus teststrandbiais.R shows an error.

The output file looks like this: vardict_TG1303.txt

All columns appear right except for the last one which contains information that I don't understand.

Could you please provide any workaround to get a proper vcf out of Vardict's pipeline ?

Regards,

Thibault.

PolinaBevad commented 2 years ago

Hi Thibault, It seems that you are using vardict with -D/--debug option - it appends full genotype information as 37th column as a debug info. Please, remove it before processing with teststrandbias.R or run vardict without this option!

Hope this helps!

tgeneste commented 2 years ago

Polina,

It did indeed work and I caused this issue by trying to understand what was wrong with my data. I think the issue is that I am using custom chromosome names that are not numbers or "ChrX" formatted. Should I open a request to add custom reference name support ? Otherwise I'll align my reads on another reference that is properly formatted for Vardict. The GATK pipelines have no issues with custom chromosome names and such addition could be useful for custom projects.

Regards, Thibault.

PolinaBevad commented 2 years ago

Thibault,

What kind of problems do you have? I remember that we even have unit tests with some custom chromosome names, like this: https://github.com/AstraZeneca-NGS/VarDictJava/blob/master/testdata/integrationtestcases/Simple%3Bhard_clip_case.fa%3Bhard_clip_next_to_del_test1.bam%3Btest%3B6674-6824%3B-f%200.0%20-p%20-r%201.txt and it worked fine. The chromosome names must be the same between BAM, BED and FASTA files, of course. Maybe you can provide the error text or the description of what is wrong with the result data?

tgeneste commented 2 years ago

Polina,

Here is the error I have with var2vcf_valid.pl.

Use of uninitialized value $chr in hash element at /usr/local/bioinfo/src/VarDict/VarDict-v1.8.3/var2vcf_valid.pl line 41, <> line 1. Use of uninitialized value $a[3] in hash element at /usr/local/bioinfo/src/VarDict/VarDict-v1.8.3/var2vcf_valid.pl line 41, <> line 1. Argument "" isn't numeric in sort at /usr/local/bioinfo/src/VarDict/VarDict-v1.8.3/var2vcf_valid.pl line 129, <> line 4.

The chromosomes names and locus names are from the beds in the txt output of vardict that I sent in the previous comment. For example it looks like this.

LocusXXX Chromosome AmpliconB promCsFAD3B AmpliconB promCsFAD3B AmpliconB promCsFAD3B

PolinaBevad commented 2 years ago

Thibault, I took the file that you provided before, removed 37th column and did this: cat vardict_TG1303.txt | ~/VarDict/teststrandbias.R | ~/VarDict/var2vcf_valid.pl The result is normal VCF file with the custom chromosome names without the errors.

Are you sure that you have the correct output after teststrandbias.R step? Can you please check this? The error shows the problem with the very first line, typically it means that the input into var2vcf_valid.pl is empty.

tgeneste commented 2 years ago

Polina,

Actually, it turns out that the piping was incorrect due to some kind of error of file access. I just had to specifiy the command with "Rscript" after the pipe symbol and everything worked fine.

Thank you for the help,

Thibault.