Closed Ella-Bowles closed 5 years ago
Please format your code by surrounding it with three backticks. And try to provide a minimal reproducible example (less code, with the data).
I think this is the same issue as in the other issues.
You need to use read.pcadapt()
before using pcadapt()
.
I did use read.pcadapt before using pcadapt, but still got the errors. From the last part of my message, here is the code from after I selected the number of PCs to retain. I also attach the PCAdapt file as well as the input vcf. ChaliNoOddsWithOuts.zip
'''data <- as.matrix(read.table("file2c1c47966ff6.pcadapt"))''' '''toto <- read.pcadapt(data,type="pcadapt")'''
'''poplist.names <- c(rep("Cha03",30), rep("Cha15",49))'''
'''x<-pcadapt(toto,K=2)'''
'''require(qvalue)''' '''qval <- qvalue(x$pvalues)$qvalues''' '''alpha <- 0.1''' '''outliers <- which(qval<alpha)''' '''xr<-pcadapt(toto[-outliers,],K=2)'''
Error in UseMethod("pcadapt") : no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"
Basically, when you're subsetting the pcadapt_matrix using toto[-outliers,]
, it becomes a simple matrix again. Because the subsetting is defined only for a matrix, so you get a matrix back.
data <- matrix(1:6, 2)
class(data) ## matrix
toto <- pcadapt::read.pcadapt(data)
class(toto) ## pcadapt_matrix
class(toto[-1, ]) ## matrix
Am I meant to use the following then? It still gives me the error.
....
toto <- pcadapt::read.pcadapt(data)
xr<-pcadapt(toto[-outliers,],K=2)
...
You need to use
toto_subset <- read.pcadapt(toto[-outliers, ], type = "lfmm")
xr <- pcadapt(toto_subset, K = 2)
Type "lfmm" means that rows of your matrix are observations and columns are SNPs. By default, it's type "pcadapt" which is the transpose of "lfmm".
It still isn't working. I really hope this isn't just me.
Given that I have already converted the vcf to a pcadapt file and figured out the number of outliers to retain, I run `data <- as.matrix(read.table("file2c1c47966ff6.pcadapt")) toto <- read.pcadapt(data,type="pcadapt")
poplist.names <- c(rep("Cha03",30), rep("Cha15",49))
x<-pcadapt(toto,K=2)
require(qvalue) qval <- qvalue(x$pvalues)$qvalues alpha <- 0.1 outliers <- which(qval<alpha) toto_subset <- read.pcadapt(toto[-outliers, ], type = "lfmm") xr <- pcadapt(toto_subset, K = 2)
par(mfrow=c(1,2)) plot(x,option="scores",pop=poplist.names) plot(xr,option="scores",pop=poplist.names)`
Error in $<-.data.frame
(*tmp*
, "Pop", value = c("Cha03", "Cha03", :
replacement has 79 rows, data has 62
And the plot that I get out of plot(x,option="scores",pop=poplist.names)
is very strange. I attach it here
Something seems to be weird with the data <- as.matrix(read.table("file2c1c47966ff6.pcadapt"))
step.
I say this, because if I run the following code (skipping the step to get the num of PCs to retain, b/c I've already done it a million times)
path_to_file <- "D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls/ChaliNoOddsWithOuts.vcf" Chali_matrix <- read.pcadapt(path_to_file,type="vcf")
`poplist.names <- c(rep("Cha03",30), rep("Cha15",49))
x<-pcadapt(Chali_matrix,K=2)
require(qvalue)
qval <- qvalue(x$pvalues)$qvalues
alpha <- 0.1
outliers <- which(qval<alpha)
par(mfrow=c(1,2))
plot(x,option="scores",pop=poplist.names)`
I get a very normal looking plot with outliers included, attached
But using this last way, I still can't get the xr step to work.
Ack....
I'll attach my vcf file here in case you want to just try to run it ChaliNoOddsWithOuts (2).zip
1/
You need to use toto[, -outliers]
.
SNPs (variables) are columns. Because it's more efficient, and it's the convention for many fields of data science.
2/
There is clearly something wrong with a plot where PC1 takes only positive values.
The problem is that you have missing values encoded as 9 in your data.
Instead of reading the pcadapt file yourself, please use bed2matrix()
instead.
Thank you, and trying your solutions.
Given the path where my pcadapt files are written, which is below the code that I use to read in and convert the file below `> path_to_file <- "D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls/ChaliNoOddsWithOuts.vcf"
Chali_matrix <- read.pcadapt(path_to_file,type="vcf")` No variant got discarded. Summary:
- input file: D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls/ChaliNoOddsWithOuts.vcf
- output file: C:\Users\Admin\AppData\Local\Temp\Rtmp6rrGdS\file3a8c8bc661.pcadapt
- number of individuals detected: 79
- number of loci detected: 8513
8513 lines detected. 79 columns detected.
For bed2matrix I first try:
try <- bed2matrix("C:/Users/Admin/AppDataLocal/Temp/Rtmp6rrGdS/file3a8c8bc661.pcadapt.bed")
And get "Error in file2other(input, type, match.arg(type.out), match.arg(allele.sep)) : File C:/Users/Admin/AppDataLocal/Temp/Rtmp6rrGdS/file3a8c8bc661.pcadapt.bed does not exist."
So, I go to where my pcadapt file is output and copy the bed file into my current directory. I then do the following:
data <- bed2matrix(file3a8c8bc661.pcadapt.bed)
And get Error in read.pcadapt(bedfile, type = "bed") : object 'file3a8c8bc661.pcadapt.bed' not found What am I doing wrong?
Even though dir() clearly shows that the file is in my working directory.
Thoughts?
Simply try bed2matrix(path_to_file)
.
If you mean for me to use this on my initial read in, so
path_to_file <- "D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls/ChaliNoOddsWithOuts.vcf"
And then
bed2matrix(path_to_file)
My initial file is not a binary PED file, and so I get an error telling me this.
If you mean for me to use
bed2matrix("C:/Users/Admin/AppDataLocal/Temp/Rtmp6rrGdS/file3a8c8bc661.pcadapt.bed")
I did this trying both using the quotes and not using the quotes. Using the quotes, which I think is the correct way to input a filepath, I get Error in data.table::fread(sub("\.bed$", ".bim", input)) : File 'C:/Users/Admin/AppData/Local/Temp/Rtmp4GhfHO/file187ce73303a.pcadapt.bim' does not exist or is non-readable. getwd()=='D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls'
I also tried
path_to_file2 <- "C:/Users/Admin/AppData/Local/Temp/Rtmp4GhfHO/file187ce73303a.pcadapt.bed" bed2matrix(path_to_file2)
And got Error in data.table::fread(sub("\.bed$", ".bim", input)) : File 'C:/Users/Admin/AppData/Local/Temp/Rtmp4GhfHO/file187ce73303a.pcadapt.bim' does not exist or is non-readable. getwd()=='D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls'
I feel a bit like we're chasing something that just isn't working. Is there any chance that you could try to work the following on my vcf file to see if it works on your end? This is the original script that was working this summer. ChaliNoOddsWithOuts.zip
`require(pcadapt)
path_to_file <- "D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls/ChaliNoOddsWithOuts.vcf" Chali_matrix <- read.pcadapt(path_to_file,type="vcf") No variant got discarded. Summary:
x <- pcadapt(Chali_matrix,K=20) plot(x,option="screeplot")
$K=2$ is the good option.
data <- as.matrix(read.table("ChaliNoOddsWithOuts.pcadapt"))
poplist.names <- c(rep("Cha03",30), rep("Cha15",49))
x<-pcadapt(data,K=2)
require(qvalue) qval <- qvalue(x$pvalues)$qvalues alpha <- 0.1 outliers <- which(qval<alpha) xr<-pcadapt(data[-outliers,],K=2) par(mfrow=c(1,2)) plot(x,option="scores",pop=poplist.names) plot(xr,option="scores",pop=poplist.names)
Conclusion: population structure changes when you remove outliers.
require("vcfR") obj.vcfR <- vcfR::read.vcfR("ChaliNoOddsWithOuts.vcf") ID <- vcfR::getID(obj.vcfR) print(ID[outliers])`
If you're not happy with the new version, feel free to install an older version from either GitHub or CRAN.
@Ella-Bowles Did you try filename <- read.pcadapt(path_to_file, type = "vcf") Otherwise what about converting your vcf to a ped or a bed using vcftools and then using pcadapt?
With your data:
vcf <- "tmp-data/ChaliNoOddsWithOuts.vcf"
bed <- pcadapt::read.pcadapt(vcf, type = "vcf")
mat <- pcadapt::bed2matrix(bed)
dim(mat)
# 79 x 8513
table(mat, exclude = NULL)
# 0 1 2 <NA>
# 457085 144253 31061 40128
@Ella-Bowles : Did you manage to use the script written by @privefl ?
Ack, I tried, but still get errors.
When I try `> vcf <- "ChaliNoOddsWithOuts.vcf"
bed <- pcadapt::read.pcadapt(vcf, type = "vcf") mat <-pcadapt::bed2matrix(bed)`
I get
From the read.pcadapt step No variant got discarded. Summary:
- input file: ChaliNoOddsWithOuts.vcf
- output file: C:\Users\Admin\AppData\Local\Temp\Rtmp2VKEtd\file2c8810883b77.pcadapt
- number of individuals detected: 79
- number of loci detected: 8513
8513 lines detected. 79 columns detected.
and then, from the bed2matrix step Error in read.pcadapt(bedfile, type = "bed") : Input should be a file path or a matrix-like object.
As I would expect, given where the file is being outputted.
When I try
> mat <- pcadapt::bed2matrix("C:/Users/Admin/AppData/Local/Temp/Rtmp2VKEtd/file2c8856dd5add.pcadapt.bed")
I get Error in data.table::fread(sub("\.bed$", ".bim", input)) : File 'C:/Users/Admin/AppData/Local/Temp/Rtmp2VKEtd/file2c8856dd5add.pcadapt.bim' does not exist or is non-readable. getwd()=='D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls'
Do you think I should try some different installation of R studio or PCAdapt, or can you see a different solution? Currently I am using PCAdapt v4.0.3 and R v3.55.1.
My apologies for all the back-and-forth. And many thanks for your help.
Ella
Can you send me a link to the vcf file or to a vcf that contains a small subset of the data?
Sure, here you go ChaliNoOddsWithOuts (2).zip
require(pcadapt)
data <- read.pcadapt("ChaliNoOddsWithOuts.vcf",type="vcf",type.out="matrix")
dim(data)
[1] 79 8513
#remove loci 100 to 110
data <- data[,-c(100:110)]
dim(data)
[1] 79 8502
#Because individuals are in lines and SNPs in columns use type="lfmm"
genotypes <- read.pcadapt(data,type="lfmm")
res <- pcadapt(genotypes,K=20)
Thank you, and I think we're almost there. I got what you have given me to work, using the reduced dataset that you generate here, and I think I just need to sort out the structure of one more command. Okay, so I run: `require(pcadapt) data <- read.pcadapt("ChaliNoOddsWithOuts.vcf",type="vcf",type.out="matrix") dim(data)
data <- data[,-c(100:110)] dim(data)
genotypes <- read.pcadapt(data,type="lfmm")
res <- pcadapt(genotypes,K=20) plot(res, option ="screeplot")
poplist.names <- c(rep("Cha03",30), rep("Cha15",49))
x <- pcadapt(genotypes,K=2)
require(qvalue) qval <- qvalue(x$pvalues)$qvalues alpha <- 0.1 outliers <- which(qval<alpha) xr<-pcadapt(genotypes[,-outliers],K=2)`
And all is well until this last command. Here I get the error Error in UseMethod("pcadapt") : no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"
I think the [,-outliers] should be correct here, given that the structure of the genotypes matrix should be individuals in rows and SNPs in columns. Thoughts?
I've updated the package so that the error you get is more explicit.
AMAZING, IT WORKS!!
Just to confirm, the following looks good to you?! `require(pcadapt) data <- read.pcadapt("ChaliNoOddsWithOuts.vcf",type="vcf",type.out="matrix")
genotypes <- read.pcadapt(data,type="lfmm")
res <- pcadapt(genotypes,K=20) plot(res, option ="screeplot")
poplist.names <- c(rep("Cha03",30), rep("Cha15",49))
x <- pcadapt(genotypes,K=2)
require(qvalue) qval <- qvalue(x$pvalues)$qvalues alpha <- 0.1 outliers <- which(qval<alpha) x1<-read.pcadapt(data[,-outliers],type="lfmm") xr<-pcadapt(x1,K=2)`
Glad that you managed to make it work.
This issue is I think the last piece of issues #23 and #25 (both closed issues) that I have been discussing here.
The following code works fine for getting outliers:
require(pcadapt)
path_to_file <- "D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls/ChaliNoOddsWithOuts.vcf" Chali_matrix <- read.pcadapt(path_to_file,type="vcf")
Choosing the number of principal components.
x <- pcadapt(Chali_matrix,K=20) plot(x,option="screeplot")
K=2 is the good option.
poplist.names <- c(rep("Cha03",30), rep("Cha15",49)) x<-pcadapt(x,K=2) require(qvalue) qval <- qvalue(x$pvalues)$qvalues alpha <- 0.1 outliers <- which(qval<alpha)
plot(x,option="scores",pop=poplist.names)
Print the list of outlier SNPs
require("vcfR") obj.vcfR <- vcfR::read.vcfR("ChaliNoOddsWithOuts.vcf") ID <- vcfR::getID(obj.vcfR) print(ID[outliers])
However, I need to be able to visualize the score plots with and without outliers, which I used to be able to do using the following code that MBlum wrote. This code is now broken (Issue #25), which happened after some update I believe, though not sure if it was R, Rstudio, etc... I note below where the errors occur. require(pcadapt)
path_to_file <- "D:/Documents/PostDoc/Walleye/stacksOutptAndAnalys/stacks_r12/sta_r12_pairwise_WithoutOddBalls/ChaliNoOddsWithOuts.vcf" Chali_matrix <- read.pcadapt(path_to_file,type="vcf")
Choosing the number of principal components.
x1 <- pcadapt(Chali_matrix,K=20) plot(x,option="screeplot")
K=2 is the good option.
Load the data matrix in R (useful to remove outliers)
data <- as.matrix(read.table("ChaliNoOddsWithOuts.pcadapt"))
Compare pop. structure with and without outliers
poplist.names <- c(rep("Cha03",30), rep("Cha15",49))
Without removal of outliers
x<-pcadapt(data, K=2)
Error in UseMethod("pcadapt") : no applicable method for 'pcadapt' applied to an object of class "pcadapt"
With removal of outliers
require(qvalue) qval <- qvalue(x$pvalues)$qvalues alpha <- 0.1 outliers <- which(qval<alpha) xr<-pcadapt(data[-outliers,],K=2)
Error in UseMethod("pcadapt") : no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"
did not run rest of this
par(mfrow=c(1,2)) plot(x,option="scores",pop=poplist.names) plot(xr,option="scores",pop=poplist.names)
Print the list of outlier SNPs
require("vcfR") obj.vcfR <- vcfR::read.vcfR("ChaliNoOddsWithOuts.vcf") ID <- vcfR::getID(obj.vcfR) print(ID[outliers])
I have tried to implement the following solution, proposed in issue #25 , which is now closed. But it doesn’t work, I still get the following error message. Note that I use the temporary PCAdapt file which is created now in my C drive, instead of the PCAdapt file that used to be created and put in the same directory that I am working in.
Will only include relevant sections of code.
data <- as.matrix(read.table("file2c1c47966ff6.pcadapt")) toto <- read.pcadapt(data,type="pcadapt")
Compare pop. structure with and without outliers
poplist.names <- c(rep("Cha03",30), rep("Cha15",49))
Without removal of outliers
x<-pcadapt(toto,K=2)
With removal of outliers
require(qvalue) qval <- qvalue(x$pvalues)$qvalues alpha <- 0.1 outliers <- which(qval<alpha) xr<-pcadapt(toto[-outliers,],K=2)
Error in UseMethod("pcadapt") : no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"
Any thoughts?
With many thanks, Ella