freeseek / gtc2vcf

Tools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
MIT License
143 stars 24 forks source link

Error in names(object) <- nm gtc2vcf_plot.R #29

Closed PriscillaPoemba closed 3 years ago

PriscillaPoemba commented 4 years ago

Dear freeseek,

I have some issues with running the R script gtc2vcf_plot.R to generate plots. My input was first a .vcf file, but i got an error about the file format, so I converted it with bgzip to a vcf.gz file (as suggested in the message) with the following command: bgzip file.vcf. After converting the file to a .vcf.gz file format, I got the error below.

gtc2vcf_plot.R 2020-09-01 https://github.com/freeseek/gtc2vcf
Command: bcftools query --format [%CHROM\t%POS\t%ID\t%INFO/meanR_AA\t%INFO/meanR_AB\t%INFO/meanR_BB\t%INFO/meanTHETA_AA\t%INFO/meanTHETA_AB\t%INFO/meanTHETA_BB\t%INFO/devR_AA\t%INFO/devR_AB\t%INFO/devR_BB\t%INFO/devTHETA_AA\t%INFO/devTHETA_AB\t%INFO/devTHETA_BB\t%GT\t%X\t%Y\t%NORMX\t%NORMY\t%R\t%THETA\t%BAF\t%LRR\n]" all_qc.unphased_extra.vcf.gz -r 11:66328095-66328095
Error in names(object) <- nm :
  'names' attribute [24] must be the same length as the vector [0]
Calls: setNames
In addition: Warning message:
In fread(cmd = cmd, sep = "\t", header = FALSE, na.strings = ".",  :
  File '/tmp/RtmpaEu4mo/file13974573efd5' has size 0. Returning a NULL data.frame.
Execution halted

Thanks in advance!

freeseek commented 4 years ago

The gtc2vcf_plot.R runs the bcftools query command you sent me. Try that command separately and see what output you get.

PriscillaPoemba commented 4 years ago

This is the output of the command:

$ bcftools query --format [%CHROM\t%POS\t%ID\t%INFO/meanR_AA\t%INFO/meanR_AB\t%INFO/meanR_BB\t%INFO/meanTHETA_AA\t%INFO/meanTHETA_AB\t%INFO/meanTHETA_BB\t%INFO/devR_AA\t%INFO/devR_AB\t%INFO/devR_BB\t%INFO/devTHETA_AA\t%INFO/devTHETA_AB\t%INFO/devTHETA_BB\t%GT\t%X\t%Y\t%NORMX\t%NORMY\t%R\t%THETA\t%BAF\t%LRR\n]" all_qc.unphased_extra.vcf.gz -r 11:66328095-66328095
>

It looks like the command is not done yet. I thought it was because of the quotes aroud the brackets. If I remove the quotes the command works, but it does nothing. No output.

If I remove the quotes in the script, it still gives me the same error.

freeseek commented 4 years ago

You do need the quotes or the bcftools query command will not work. Now that I have noticed it, it's very weird that there is no opening quote in your command. If you look at the code, it says:

fmt <- paste0('"[%', ...
...
cmd <- paste0('bcftools query --format ', fmt, ...
...
write(paste('Command:', cmd), stderr())

So there should be an opening quote. I don't know why this is happening. It must be something with your version of R or you modified the source code of gtc2vcf_plot.R. Would you be able to perform some testing and let me understand why on your machine the quote is disappearing from the fmt/cmd string?

PriscillaPoemba commented 4 years ago

I added the quotes to the command, but it still gives me this error:

gtc2vcf_plot.R 2020-09-01 https://github.com/freeseek/gtc2vcf
Command: bcftools query --format "[%CHROM\t%POS\t%ID\t%INFO/meanR_AA\t%INFO/meanR_AB\t%INFO/meanR_BB\t%INFO/meanTHETA_AA\t%INFO/meanTHETA_AB\t%INFO/meanTHETA_BB\t%INFO/devR_AA\t%INFO/devR_AB\t%INFO/devR_BB\t%INFO/devTHETA_AA\t%INFO/devTHETA_AB\t%INFO/devTHETA_BB\t%GT\t%X\t%Y\t%NORMX\t%NORMY\t%R\t%THETA\t%BAF\t%LRR\n]" all_qc.unphased_extra.vcf.gz -r 11:66328095-66328095
Error in names(object) <- nm :
  'names' attribute [24] must be the same length as the vector [0]
Calls: setNames
In addition: Warning message:
In fread(cmd = cmd, sep = "\t", header = FALSE, na.strings = ".",  :
  File '/tmp/RtmpfLM7lf/file34e869d42b85' has size 0. Returning a NULL data.frame.
Execution halted

I'm still not sure why the quotes were missing, but now it is fixed. If I run the bcftools command still nothing happens, but the error is still there if I run this script.

freeseek commented 4 years ago

If you run the bcftools command and you get an empty output, then maybe that's the main issue. Are you sure the 11:66328095-66328095 region actually contains SNPs in your VCF? Could it be that you should use chr11 rather than 11? Make sure the query command actually extracts useful information from the VCF first.