bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
39 stars 10 forks source link

outliers detection issue #34

Closed CocoMlle closed 4 years ago

CocoMlle commented 5 years ago

Hi Michael and team,

I have some problem to detect outliers with my data. I'm currently working with 250 staphylococcus strains, around 200 strains are probably clonal population and 50 others from "other" population. Inside the 200 strains we suspected we have an another structuration but it's a second step. VCF file was created with snippy-core here it's what i have : filename <- read.pcadapt("capitis_clean.vcf", type="vcf") x <- pcadapt(filename,K=20) plot(x, option = "screeplot") screeplot.pdf I chose K=4 x <- pcadapt(filename,K=4) plot(x, option="qqplot", threshold=0.1) qqplot.pdf

plot(x, option="manhattan") manhattan.pdf Is this manhattan plot "normal"? i have the feeling everything is wrong with these data !

qval <- qvalue(x$pvalues)$qvalues Error in smooth.spline(lambda, pi0, df = smooth.df) : missing or infinite values in inputs are not allowed

I tried with qval <- qvalue(x$pvalues, pi0 = 1) and p.adjust(x$pvalues) but i have the same problem, every values are equal to NAs or 1.000000e+00. I can't understand what is wrong with the data! Plus, with a new version of snippy and vcf i have this problem : filename2 <- read.pcadapt("outbreak_POB2.vcf", type="vcf") x <- pcadapt(filename2,K=20) Error in solve.default(cov, ...) : le système est numériquement singulier : conditionnement de la réciproque = 1.62209e-18

here at the data: capitis_clean.vcf.zip outbreak_POB.vcf.zip

do you have any idea what is wrong? thanks a lot for your help, Marine

privefl commented 5 years ago

I would have used K = 2 here. Would be good to look at the PC scores too.

Do you have lots of missing data?

CocoMlle commented 5 years ago

ok i will try with K = 2 thanks!

it seems that yes i have a lot of NAs but really i'm sorry i can't understand why, is it possible my vcf file have problem ?

privefl commented 5 years ago

The problem might come from read.pcadapt() too, for which we state that we don't maintain the "vcf" format anymore.

Instead, you should use PLINK 1.9 to do both steps of quality control and conversion to bed/bim/fam format.

CocoMlle commented 5 years ago

oh ok i understand. what format do you recommend instead? i will try PLINK 1.9 thanks a lot ! :)