Closed kiran-lee closed 7 months ago
Hi Kiran,
I think the input region is not valid, which should be in the form of chr:start-end, or just chr.
I'll add validators for the input in next release. Thanks for the feedback.
Also make sure the af.tsv is valid, which should have a header line and five columns. If the af is calculated from the truth file, you can just make vcfcomp doing the job by ignoring the af option ie af=NULL
Thank you so much for our email discussion troubleshooting my NAs.
We realised this was due to missing genotypes in the "truth" vcf file. Removing missing genotypes in the "truth" dataset beforehand produced a sensible r2 value. This was the script used:
library(vcfppR)
truth <- vcftable("x.vcf.gz", setid = T, info = F) test <- vcftable("y.vcf.gz", setid = T, format = "DS", info = F)
N <- 22 ### number of samples is 22 truth$gt <- truth$gt[,1:N] truth$samples <- test$samples[1:N]
w <- which(!is.na(rowSums(truth$gt))) truth$gt <- truth$gt[w,] truth[2:9] <- lapply(truth[2:9], function(a) a[w]) saveRDS(truth, file = "truth.rds")
names <- truth$samples[1:22] samples <- paste0(names, collapse = ",")
res <- vcfcomp(test = "y.vcf.gz" , truth = "truth.rds", stats = "r2", names = names, samples = samples, bins = c(0,1))
str(res) res
Hi Zilong-Li,
Thank you for your software. I found it from your reply here.
I am currently trying to use vcfcomp to test concordance between some imputed genotypes and their original genotypes using the following script: library(vcfppR) res <- vcfcomp(test = "x.vcf.gz", truth = "y.vcf.gz", region="z|z", af="af.tsv", stats = "r2") str(res) write.table(res, file = "concordance", sep = "\t", col.names = T, row.names = T, quote = FALSE)
I used vcftools --freq to output the allele frequency for af.tsv.
However, this returns a dataframe fulls of NAs:
Do you have a suggestion of where this is going wrong?