bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
37 stars 10 forks source link

Importing loci ID #54

Closed marcticruz closed 4 years ago

marcticruz commented 4 years ago

populations.snps_vcf.xlsx

I've been converting the attached vcf file to a bed file and I apply the read.pcadapt function on the bed file using R. I'm wondering if there is a way to make pcadapt store the information available in the "ID" column of the vcf file. Does anyone know if it is possible at all and how to do it?

privefl commented 4 years ago

You should use PLINK to convert vcf to bed (as recommended in the documentation), and then the variant IDs would be in the bim file.

marcticruz commented 4 years ago

Hi Florian,

Thanks for your reply.

I've been using plink to convert the vcf files to bed files. After that I import the bed file into R using the read.pcadapt function like in the script below and in the attached file "pcadapt4.R".

Like you mentioned, the values that I want are indeed in the bim file. More specifically in the second column of the bim file. I'm wondering if there is a way to import those numbers into R and store them in one the variables of the vector dat_pca (Please take a look at the script below). Is it possible to do that?

Thanks for you help,

Have a nice day,

Marcos

-------------------------------------------------------------------------------------------------------------------------------------------------//----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Load packages

library(pcadapt) library(ggplot2)

Import data

dat <- read.pcadapt("/home/marcos/Dropbox/PhD_project/populations_results_vcf_unlinked_ordered_export/populations.bed", type = "bed")

Calculate PCA with the correct K value suggested by scree plot

dat_pca <- pcadapt(input = dat, K = 3, method = "mahalanobis", min.maf = 0.05, ploidy = 2, LD.clumping = NULL, pca.only = FALSE, tol = 1e-04)

Checck variables

dat_pca$pass

On Wed, Jul 22, 2020 at 11:55 PM Florian Privé notifications@github.com wrote:

You should use PLINK to convert vcf to bed (as recommended in the documentation), and then the variant IDs would be in the bim file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bcm-uga/pcadapt/issues/54#issuecomment-662818040, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKFM7XT62A3NNVPEJ4TJRYTR467E7ANCNFSM4PFC5FUQ .

privefl commented 4 years ago

Are you asking me how to read a text file into R? You can use, read.table(), readr::read_table() or bigreadr::fread2(), etc.. Please look at this on the internet, they are plenty of tutorials for these basic R operations.

marcticruz commented 4 years ago

Thanks for your reply.

I'm asking you if there is a way to call those numbers in the vector where I store the results generated by the pcadapt function. In my script the name of this vector is dat_pca.

I'm sorry if this is something too basic that I can easily find on the internet or for not being very clear about what I mean in my questions. Before I posted on the forum and asked you about this, I had been looking around on the internet for a few days but I could not find the answer to this.

I appreciate any feedback you could give me on this.

privefl commented 4 years ago

OK. Can you share (again maybe) your bim file and pcadapt result object (saved e.g. as an rds file using saveRDS())?

marcticruz commented 4 years ago

Thanks for bearing with me through this.

Please find attached the files.

On Thu, Jul 23, 2020 at 1:15 PM Florian Privé notifications@github.com wrote:

OK. Can you share (again maybe) your bim file and pcadapt result object (saved e.g. as an rds file using saveRDS())?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bcm-uga/pcadapt/issues/54#issuecomment-663157212, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKFM7XWGT2PPTDKS6K7XHQDR5B45HANCNFSM4PFC5FUQ .

privefl commented 4 years ago

I don't see any attached files. This is still GitHub here.

marcticruz commented 4 years ago

Sorry about that. I was answering the messages and attaching files through gmail.

I've attached the files through github now

pcadapt_results2.zip populations.zip

privefl commented 4 years ago

Something like this?

obj.pcadapt <- readRDS("tmp-data/pcadapt_results2.rds")
map <- bigreadr::fread2("tmp-data/populations.bim")

out <- which(obj.pcadapt$pvalues > 5e-8)
write(map$V2[out], tempfile(), ncolumns = 1)
marcticruz commented 4 years ago

Thank you so much again! I have a feeling that it might actually help a lot if I can do this

obj.pcadapt <- readRDS("/home/marcos/Desktop/pcadapt_results2.rds") map <- bigreadr::fread2("/home/marcos/Desktop/populations.bim")

obj.pcadapt$LOCI_IDS <- map$V2

If add this new variable called LOCI_ID in the obj.pcadapt object, will my loci IDS correspond to the right p-values in the pvalues variable?

privefl commented 4 years ago

Yes, this is what I suppose in my code. You can verify they have at least the same length.

marcticruz commented 4 years ago

Thank you so much for helping me out with this!

Have a great day!