bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
37 stars 10 forks source link

truncated p-values #92

Closed polobby closed 1 month ago

polobby commented 1 month ago

Hello,

I have encountered an issue while using the pcadapt package. I'm working on a bed file, corresponding to vcf filtered with only high quality SNPs (QUAL > 30). I've created the model by running this :

path_to_vcf <- file.choose() vcf_pcadapt <- read.pcadapt(path_to_vcf, type='bed') pcadapt_result <- pcadapt(vcf_pcadapt, K = 20)

Pcadapt gave me beautiful results with the plot of the individuals on the 1st PCs space, but when i pursue my analysis i have a problem :

With plot(pcadapt_result,option = 'manhattan') I did a Manhattan plot to detect potentially selected SNPs, and this works fine (cf picture attached). As you can see on the screenshot, the highest p-values are over 1000. But when i want to see in detail which SNPs have the highest p-values, using the pcadapt$pvalues list, it doesn't match. I first clean the list to clear NAs and negative or null values :

obj <- pcadapt_result$pvalues obj <- obj[!is.na(obj)] obj <- obj[obj > 0]

And then when i look for the minimal pvalues / or the highest -log10(pvalue), it doesn't match :

min(obj) [1] 2.322109e-322 -log10(min(obj)) [1] 321.6341

Actually i created a new dataset containing only -log10(obj) (so all the pcadapt$pvalues not null and positive), and they're all smaller than 321.6341. It seems to me that the pvalues stored in the list are truncated, but not the ones used for the Manhattan plot.

Does someone also encountered this issue ? Or is it something i did wrong ?

Thanks in advance,

Paul

man_plot_no_mt_no_cev_pcadapt

privefl commented 1 month ago

Yes, this is normal. With 64-bit floating-point numbers, you cannot store a value below 2e-308, so you'll simply get exactly pval=0 for the highest points on the Manhattan plot.

Use $chi2.stat instead to get the top ones.

polobby commented 1 month ago

Oh okay thank you ! And is there any way to get directly the list of -log10(pvalues) used in the Manhattan plot ? Also when i try to plot the qqplot, i get the same issue : it seems that it uses the truncated pvalues (attached picture), even if it's written "-log10(p-values" on the axis. Maybe i'm doing someting wrong. Is there a way to solve this issue ? My code is just : plot(pcadapt_result, option = 'qqplot')

truncated_qqplot

privefl commented 1 month ago

This is the way -log10(pval) are computed for the Manhattan plot: https://github.com/bcm-uga/pcadapt/blob/master/R/plotUtils.R#L219-L222

For the QQ-plot, I don't think it makes sense to plot the very high values.

polobby commented 1 month ago

Ok thanks a lot !