edgardomortiz / vcf2phylip

Convert SNPs in VCF format to PHYLIP, NEXUS, binary NEXUS, or FASTA alignments for phylogenetic analysis
GNU General Public License v3.0
294 stars 85 forks source link

issues convert vcf file to phylyp #50

Open lophostoma opened 5 months ago

lophostoma commented 5 months ago

Hi, I have the following issue. the vcf2phylip tool did not process the VCF file provided as expected. The output format 58 0 indicates that it detected 58 samples but 0 sites, which is not typical for a valid VCF file containing genotype information. I used the following code to generate the vcf file prior to use vcf2phylyp

enroot start --mount $HOME --root --rw staphb+bcftools sh -c " bcftools view -h /home/carlos.carrion/output_filtered.vcf > /home/carlos.carrion/output_reformat.vcf && bcftools query -f \"%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%INFO[\t%SAMPLE=%GP]\n\" /home/carlos.carrion/output_filtered.vcf >> /home/carlos.carrion/output_reformat.vcf"

Thanks

edgardomortiz commented 5 months ago

Hi @lophostoma

To diagnose the problem I need a few thousand lines from your VCF and probably the exact error message from vcf2phylip since I rarely use bcftools and could not predict what kind of output your command will make. Maybe the genotypes were not biallelic?

Edgardo

lophostoma commented 5 months ago

Ok Thanks for your reply. Attached you will find vcf file and the output file from vcf2phylip. I did not obtain error message, but only an empty output file. the genotyped were generated in ANGSD and are biallelic. subsample.vcf.zip tmp.min4.phy.zip

Thanks for your time Carlos

edgardomortiz commented 5 months ago

Hi again Carlos,

Your vcf zip file seems to be corrupted, I re-downloaded a couple of times and can't be decompressed...

lophostoma commented 5 months ago

I am having issues with the size of the file that I can send you trough GitHub. if possible can I send to a email account?? Hope this goes ok tmp_carlos.vcf.gz

edgardomortiz commented 5 months ago

The new file is corrupted as well. My email has size limitations too. I just need at most 1000 lines, you can run this on your VCF:

head -1000 my_vcf.vcf > 1000lines.vcf

Then compress the result and upload, it shouldn't be too big.

Edgardo

lophostoma commented 5 months ago

Atached the file: 1000lines.vcf.zip

edgardomortiz commented 5 months ago

Hi, sorry I was assuming that the 1000 lines would contain some genotypes, I only got the headers of your reference contigs. Please add 1000 to the number of contigs in your reference (i.e. if your reference has 6500 contigs, repeat the head command with head -7500). Also I saw the phylip you sent, are you sure your VCF contains valid genotypes? how many should there be?

Edgardo

edgardomortiz commented 5 months ago

Also, I checked the bcftools manual and I think you are creating a non-standard VCF format by using this command:

bcftools query -f "%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%INFO[\t%SAMPLE=%GP]\n"

vcf2phylip can only handle the standard VCF format: https://samtools.github.io/hts-specs/VCFv4.2.pdf so I would recommend to leave the VCF format as default (why do you need that specific format?)

Edgardo

lophostoma commented 5 months ago

Hi, Thanks for your reply. I will generate the file again in standard VCF format and try vcf2phylip. I will let you know the result of this. Thanks for helping me notice that.