Closed James-S-Santangelo closed 1 week ago
Hi James, can you share a reproducible example, and if possible a handful of SNPs from your VCF file that reproduces the issue? Otherwise it's difficult to diagnose the issue. Thanks!
On Sat, 17 Aug 2024, 23:18 James Santangelo, @.***> wrote:
Hey Jim,
I'm trying to use your repo to plot some haplotypes but the popmap validation is failing and I can't seem to figure out why. It happens whether I'm using a vcf_object (i.e., though vcfR) or vcf (i.e., using bcftools) as input.
For reference, here are the rownames in the transposed genotype matrix extracted from the vcf_object:
rownames(test) [1] "s_40_1" "s_40_3" "s_40_6" "s_40_7" "s_40 https://www.google.com/maps/search/40_6%22+++%22s_40_7%22+++%22s_40?entry=gmail&source=g_8" "s_40_10" "s_40_12" "s_40_17" "s_40_19" "s_41_1" "s_41_2" "s_41_7" "s_41 https://www.google.com/maps/search/41_2%22+++%22s_41_7%22+++%22s_41?entry=gmail&source=g_8" [14] "s_41_12" "s_41_13" "s_41_14" "s_41_16" "s_41_18" "s_42 https://www.google.com/maps/search/41_16%22++%22s_41_18%22++%22s_42?entry=gmail&source=g_5" "s_42_9" "s_42_10" "s_42_11" "s_42_13" "s_42 https://www.google.com/maps/search/42_11%22++%22s_42_13%22++%22s_42?entry=gmail&source=g_17" "s_42_20" "s_43_4" [27] "s_43_5" "s_43_6" "s_43_8" "s_43_10" "s_43_12" "s_43_13" "s_43_14" "s_43_15" "s_1_9" "s_2_3" "s_3_5" "s_4_18" "s_5_16" [40] "s_97_3" "s_97_6" "s_97_7" "s_97_10" "s_97_11" "s_97_13" "s_97_14" "s_98_1" "s_99_5" "s_100_13" "s_101_17" "s_6_8" "s_7_4" [53] "s_7_6" "s_7_7" "s_7_11" "s_7_13" "s_7_16" "s_7_19" "s_7_20" "s_21_18" "s_22_15" "s_23_4" "s_23_6" "s_23_7" "s_23_9" [66] "s_23_15" "s_24_6" "s_37_6" "s_37_8" "s_37_10" "s_37_12" "s_37_13" "s_37_15" "s_37_16" "s_52_3" "s_53_11" "s_54_7" "s_54_8" [79] "s_54_15" "s_54_16" "s_54_18" "s_54_20" "s_56_1" "s_77_18" "s_78_4" "s_79_17" "s_80_18" "s_81_3" "s_82_19" "s_83_5" "s_83_9" [92] "s_83_10" "s_83_13" "s_83_14" "s_83_15" "s_83_17" "s_83_18" "s_95_15" "s_96_5" "s_115_6" "s_116_1" "s_116_3" "s_116_7" "s_116_12" [105] "s_116_15" "s_116_17" "s_116_18" "s_117_6" "s_119_19"
And here is the start of the popmap file:
head(my_popmap)
A tibble: 6 × 2
ind pop
1 s_40_1 Urban 2 s_40_3 Urban 3 s_40_6 Urban 4 s_40_7 Urban 5 s_40_8 Urban 6 s_40_10 Urban You can see that those first six samples match the first six samples in the rownames of the genotype matrix. Nonetheless, I get the following error when trying to generate a plot:
new_plot <- genotype_plot(vcf_object = vcf, popmap = my_popmap, snp_label_size = 50000) Removing 0 SNPs with > 50% missing data Plotting SNP label markers Error in genotype_plot(vcf_object = vcf, popmap = my_popmap, snp_label_size = 50000) : ERROR The following inds are not in vcfR object: c("s_40_1", "s_40_3", "s_40_6", "s_40_7", "s_40 https://www.google.com/maps/search/40_6%22,+%22s_40_7%22,+%22s_40?entry=gmail&source=g_8", "s_40_10", "s_40_12", "s_40_17", "s_40_19", "s_41_1", "s_41_2", "s_41_7", "s_41 https://www.google.com/maps/search/41_2%22,+%22s_41_7%22,+%22s_41?entry=gmail&source=g_8", "s_41_12", "s_41_13", "s_41_14", "s_41_16", "s_41_18", "s_42 https://www.google.com/maps/search/41_16%22,+%22s_41_18%22,+%22s_42?entry=gmail&source=g_5", "s_42_9", "s_42_10", "s_42_11", "s_42_13", "s_42 https://www.google.com/maps/search/42_11%22,+%22s_42_13%22,+%22s_42?entry=gmail&source=g_17", "s_42_20", "s_43_4", "s_43_5", "s_43_6", "s_43_8", "s_43_10", "s_43_12", "s_43_13", "s_43_14", "s_43_15", "s_37_6", "s_37_8", "s_37_10", "s_37_12", "s_37_13", "s_37_15", "s_37_16", "s_1_9", "s_2_3", "s_3_5", "s_4_18", "s_5_16", "s_97_3", "s_97_6", "s_97_7", "s_97_10", "s_97_11", "s_97_13", "s_97_14", "s_98_1", "s_99_5", "s_100_13", "s_101_17", "s_6_8", "s_7_4", "s_7_6", "s_7_7", "s_7_11", "s_7_13", "s_7_16", "s_7_19", "s_7_20", "s_77_18", "s_78_4", "s_79_17", "s_80_18", "s_81_3", "s_82_19", "s_83_5", "s_83_9", "s_83_10", "s_83_13", "s_83_14", "s_83_15", "s_83_17", "s_83_18", "s_95_15", "s_96_5")
Any idea what might be going on here?
Thanks in advance for your help!
James
— Reply to this email directly, view it on GitHub https://github.com/JimWhiting91/genotype_plot/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIYA6YTXOM7ODBGY7WX7HH3ZR7D4VAVCNFSM6AAAAABMVYMXWSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ3TCNRYHE4TEMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Sure thing!
I've attached the popmap and VCF (tmp.txt
-- renamed since GitHub didn't like the VCF extension). Here is the code I was using:
library(vcfR)
library(GenotypePlot)
library(tidyverse)
vcf <- vcfR::read.vcfR("~/Downloads/tmp.vcf")
my_popmap <- read_delim("~/Downloads/popmap.txt", delim="\t")
new_plot <- genotype_plot(vcf_object = vcf, popmap = my_popmap, snp_label_size = 50000)
new_plot
Thanks!
If you could also share your full script that'd be helpful. In your question you're sharing the rownames of an object called test, but you're passing an object called vcf to genotype_plot(), so it's unclear how these two objects are related
On Sat, 17 Aug 2024, 23:30 James Whiting, @.***> wrote:
Hi James, can you share a reproducible example, and if possible a handful of SNPs from your VCF file that reproduces the issue? Otherwise it's difficult to diagnose the issue. Thanks!
On Sat, 17 Aug 2024, 23:18 James Santangelo, @.***> wrote:
Hey Jim,
I'm trying to use your repo to plot some haplotypes but the popmap validation is failing and I can't seem to figure out why. It happens whether I'm using a vcf_object (i.e., though vcfR) or vcf (i.e., using bcftools) as input.
For reference, here are the rownames in the transposed genotype matrix extracted from the vcf_object:
rownames(test) [1] "s_40_1" "s_40_3" "s_40_6" "s_40_7" "s_40 https://www.google.com/maps/search/40_6%22+++%22s_40_7%22+++%22s_40?entry=gmail&source=g_8" "s_40_10" "s_40_12" "s_40_17" "s_40_19" "s_41_1" "s_41_2" "s_41_7" "s_41 https://www.google.com/maps/search/41_2%22+++%22s_41_7%22+++%22s_41?entry=gmail&source=g_8" [14] "s_41_12" "s_41_13" "s_41_14" "s_41_16" "s_41_18" "s_42 https://www.google.com/maps/search/41_16%22++%22s_41_18%22++%22s_42?entry=gmail&source=g_5" "s_42_9" "s_42_10" "s_42_11" "s_42_13" "s_42 https://www.google.com/maps/search/42_11%22++%22s_42_13%22++%22s_42?entry=gmail&source=g_17" "s_42_20" "s_43_4" [27] "s_43_5" "s_43_6" "s_43_8" "s_43_10" "s_43_12" "s_43_13" "s_43_14" "s_43_15" "s_1_9" "s_2_3" "s_3_5" "s_4_18" "s_5_16" [40] "s_97_3" "s_97_6" "s_97_7" "s_97_10" "s_97_11" "s_97_13" "s_97_14" "s_98_1" "s_99_5" "s_100_13" "s_101_17" "s_6_8" "s_7_4" [53] "s_7_6" "s_7_7" "s_7_11" "s_7_13" "s_7_16" "s_7_19" "s_7_20" "s_21_18" "s_22_15" "s_23_4" "s_23_6" "s_23_7" "s_23_9" [66] "s_23_15" "s_24_6" "s_37_6" "s_37_8" "s_37_10" "s_37_12" "s_37_13" "s_37_15" "s_37_16" "s_52_3" "s_53_11" "s_54_7" "s_54_8" [79] "s_54_15" "s_54_16" "s_54_18" "s_54_20" "s_56_1" "s_77_18" "s_78_4" "s_79_17" "s_80_18" "s_81_3" "s_82_19" "s_83_5" "s_83_9" [92] "s_83_10" "s_83_13" "s_83_14" "s_83_15" "s_83_17" "s_83_18" "s_95_15" "s_96_5" "s_115_6" "s_116_1" "s_116_3" "s_116_7" "s_116_12" [105] "s_116_15" "s_116_17" "s_116_18" "s_117_6" "s_119_19"
And here is the start of the popmap file:
head(my_popmap)
A tibble: 6 × 2
ind pop
1 s_40_1 Urban 2 s_40_3 Urban 3 s_40_6 Urban 4 s_40_7 Urban 5 s_40_8 Urban 6 s_40_10 Urban You can see that those first six samples match the first six samples in the rownames of the genotype matrix. Nonetheless, I get the following error when trying to generate a plot:
new_plot <- genotype_plot(vcf_object = vcf, popmap = my_popmap, snp_label_size = 50000) Removing 0 SNPs with > 50% missing data Plotting SNP label markers Error in genotype_plot(vcf_object = vcf, popmap = my_popmap, snp_label_size = 50000) : ERROR The following inds are not in vcfR object: c("s_40_1", "s_40_3", "s_40_6", "s_40_7", "s_40 https://www.google.com/maps/search/40_6%22,+%22s_40_7%22,+%22s_40?entry=gmail&source=g_8", "s_40_10", "s_40_12", "s_40_17", "s_40_19", "s_41_1", "s_41_2", "s_41_7", "s_41 https://www.google.com/maps/search/41_2%22,+%22s_41_7%22,+%22s_41?entry=gmail&source=g_8", "s_41_12", "s_41_13", "s_41_14", "s_41_16", "s_41_18", "s_42 https://www.google.com/maps/search/41_16%22,+%22s_41_18%22,+%22s_42?entry=gmail&source=g_5", "s_42_9", "s_42_10", "s_42_11", "s_42_13", "s_42 https://www.google.com/maps/search/42_11%22,+%22s_42_13%22,+%22s_42?entry=gmail&source=g_17", "s_42_20", "s_43_4", "s_43_5", "s_43_6", "s_43_8", "s_43_10", "s_43_12", "s_43_13", "s_43_14", "s_43_15", "s_37_6", "s_37_8", "s_37_10", "s_37_12", "s_37_13", "s_37_15", "s_37_16", "s_1_9", "s_2_3", "s_3_5", "s_4_18", "s_5_16", "s_97_3", "s_97_6", "s_97_7", "s_97_10", "s_97_11", "s_97_13", "s_97_14", "s_98_1", "s_99_5", "s_100_13", "s_101_17", "s_6_8", "s_7_4", "s_7_6", "s_7_7", "s_7_11", "s_7_13", "s_7_16", "s_7_19", "s_7_20", "s_77_18", "s_78_4", "s_79_17", "s_80_18", "s_81_3", "s_82_19", "s_83_5", "s_83_9", "s_83_10", "s_83_13", "s_83_14", "s_83_15", "s_83_17", "s_83_18", "s_95_15", "s_96_5")
Any idea what might be going on here?
Thanks in advance for your help!
James
— Reply to this email directly, view it on GitHub https://github.com/JimWhiting91/genotype_plot/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIYA6YTXOM7ODBGY7WX7HH3ZR7D4VAVCNFSM6AAAAABMVYMXWSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ3TCNRYHE4TEMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Yup, added in above message, thanks! The test
object was just part of my unsuccessful attempt to troubleshoot.
James
Hi James,
I've had a look into this, genotype_plot() expects the popmap to be a data.frame object, not a tibble. This is obviously pretty arbitrary, and I think I'll add a catch for this so it does the conversion to data.frame internally. But the difference is that you can subset a data.frame and return a vector (of inds in this case), whereas the same code applied to a tibble returns a one column tibble instead of a vector, which is what's breaking in this case.
In the meantime, you should be able to run your analysis by just converting your popmap to a data.frame before supplying. The below ran fine for me.
library(vcfR)
library(GenotypePlot)
library(tidyverse)
vcf <- vcfR::read.vcfR("~/Downloads/tmp.vcf")
my_popmap <- read_delim("~/Downloads/popmap.txt", delim="\t")
# Convert to a data.frame
my_popmap = as.data.frame(my_popmap)
new_plot <- genotype_plot(vcf_object = vcf,
popmap = my_popmap,
snp_label_size = 50000)
new_plot
Once I've pushed the update I'll close this issue, but in the meantime I'll leave it open in case anyone has the same issue.
All the best!
Awesome, thanks for sorting this out so quickly, Jim!
James
Hey Jim,
I'm trying to use your repo to plot some haplotypes but the popmap validation is failing and I can't seem to figure out why. It happens whether I'm using a vcf_object (i.e., though
vcfR
) or vcf (i.e., usingbcftools
) as input.For reference, here are the rownames in the transposed genotype matrix extracted from the vcf_object:
And here is the start of the popmap file:
You can see that those first six samples match the first six samples in the rownames of the genotype matrix. Nonetheless, I get the following error when trying to generate a plot:
Any idea what might be going on here?
Thanks in advance for your help!
James