bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
39 stars 10 forks source link

Problems in identifying outliers #29

Closed abcosta closed 3 years ago

abcosta commented 5 years ago

Hi,

I'm currently working with a VCF file containing 24 individuals and 43,114 SNPs (no missing data and filtered for LD). I'm trying to check for outliers in my vcf file using both PCAdapt and Outflank, however I have been facing some issues in both methods. When I run the PCAdapt script I receive the following message after "plot(res,option="stat.distribution")":

Warning message: Removed 66 rows containing non-finite values (stat_bin)

If I continue with the script, for the "plot(-log10(res$pvalues))" I obtain a plot with -log10(res$pvalues) ranging from 0 to 40 (with most of the points below 10). And if I run "outliers <- which(qual < alpha)" it returns thousands of loci as outliers.

Interestingly when I ran Outflank for the same data set it indicated the presence of 66 outliers (same number as in the warning message above - would it be the same loci?). But when I ask to print the outliers it only lists NAs, so I cannot figure out which ones are the outliers.

Therefore, I was wondering if you could help me to identify the presence of outliers in my data set. I am not sure if it is a problem in my VCF file or if there is something wrong with the script I am following.

Thank you, Ana

privefl commented 5 years ago

Those might be low MAF SNPs?

abcosta commented 5 years ago

I have filtered for MAF, but I think I'll increase the threshold that was recommended to use and check if it helps

mblumuga commented 5 years ago

It is difficult to answer to your question. If you can give us access to your data, I can run scripts and check what is going on.

abcosta commented 5 years ago

Hi,

Thank you! I'm sending you the data file.

Best, Ana

privefl commented 4 years ago

Has this been fixed?