bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
37 stars 10 forks source link

outlier SNP IDs from output #68

Closed Sistrurus-Steve closed 2 years ago

Sistrurus-Steve commented 2 years ago

Hello,

I am having a little trouble interpreting the qvalue output and was wondering if you might be able to help.

My input .bed file contains a total of 2,863 SNPs but the SNP ID naming system is somewhat arbitrary and based on the much larger data-set that the SNPs were extracted from. After I run PCAdapt and print the vector "outliers" to determine which SNPs are outliers, it does not return the specific SNP IDs that are in the original .bed file. It returns a list of integers with a vector length of ~100 - 400 depending on alpha value used. The integers themselves, range in value from 19 to 2,777. I assume that these are the specific SNPs that are considered outliers.

The problem I am having is that I am unsure how to determine which SNPs these are out of the original 2,863.

Can you help me understand this output? Do the integers correspond to the numeric order that the SNPs are listed in the input .bed file? For example, the first integer in the output vector is 19. Does this mean the 19th SNP/loci in the .bed file is an outlier?

The SNPs come from a denovo assembly of RADseq data in STACKS that I then used to create the original .vcf file. I used a combination VCFtools and PLINK to convert the .vcf file into a .bed file - just in case that information makes a difference.

Thank you very much for you help! I really appreciated it!

privefl commented 2 years ago

If by "outliers", you mean outliers <- which(qval < alpha), yes, they are the positions in the bed/bim files (you can verify this by checking that qval is of length 2863).

Also, because you have so few SNPs, I would recommend you follow what is suggested in https://github.com/bcm-uga/pcadapt/issues/56.

Sistrurus-Steve commented 2 years ago

Excellent, yes that is exactly what I meant. Sorry, if it wasn't super clear.

Thank you for your response!

I will follow your recommendation in #56 but this does raise another related question. Will adding the null alleles change the answer to the previous question? Will the "outliers" still be in the same position?

privefl commented 2 years ago

Yes, if you put the null SNPs after, then the ones you're interested in should be on the same positions.

Sistrurus-Steve commented 2 years ago

Okay, awesome. Thanks so much!