bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
39 stars 10 forks source link

population vector in pcadapt #93

Open abilska opened 1 month ago

abilska commented 1 month ago

Hello, we would like to analyze SNPs under selection across different populations (in our case, we have six), and the sample sizes (belove) provided are just examples. Unfortunately, we keep encountering an error, and it seems that the term related to different populations (pop_vector) has been deleted from the program. Could you please help us understand what we might be doing wrong? Is this population selection option available in another feature/version?

Best regards,

################
Load required libraries
library(pcadapt)

# Load your SNP data (ensure it's in a suitable format)
# Assuming the file 'snp_data.csv' is structured with samples as rows and SNPs as columns
snp_data <- read.csv("snp_data.csv", row.names = 1)

# Create a population vector based on the specified sizes
pop_vector <- c(rep(1, 20),  # Population 1
                rep(2, 30),  # Population 2
                rep(3, 10),  # Population 3
                rep(4, 40),  # Population 4
                rep(5, 8),   # Population 5
                rep(6, 11))  # Population 6

# Run pcadapt analysis
pcadapt_result <- pcadapt(snp_data, K = 6)  # K is the number of populations

# Plot the results
plot(pcadapt_result)

# Extract SNPs under selection based on a defined threshold
# Set an appropriate threshold; this example uses a placeholder
threshold <- 0.05  # Adjust this value based on your criteria
snp_selection <- which(pcadapt_result$loadings[,1] > threshold)

# Print SNPs under selection
print(snp_selection)

# If you want to visualize the loadings
plot(pcadapt_result, type = "scores")
privefl commented 1 month ago

What is the error you get exactly?

abilska commented 1 month ago

it's about the function and message: "pcadapt_result <- pcadapt(x = file_input, K = 6, population = pop_info) # K = number of populations" "Error in pcadapt(x = file_input, K = 6, population = pop_info) : unused arguments (x = file_input, population = pop_info)"

Our goal is to identify which SNPs shown as outliers belong to a given population (in our case 6 pops). is such information included somewhere in the results of pcadapt analysis?

privefl commented 1 month ago

I am not sure what you mean exactly by "a SNP belongs to a population". If the goal is to find, for the pcadapt outlier variants, for which population they have a different allele frequency (AF), then just compute AFs per population with e.g. PLINK, then find the population for which the AF deviates the most from e.g. the median AF of the 6 pops.