bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
39 stars 10 forks source link

System is computationally singular error when using pcadapt on imported .bed file of simulated data #76

Closed jpfontenelle closed 2 years ago

jpfontenelle commented 2 years ago

Hello Florian and all.

I am generating vcf files from simulated populations with 100 loci and 2 alleles per loci.

These vcf files are then used to generate .bed and related files using plink1.9 as follows: plink1.9 --vcf INPUT.vcf --make-bed --recode --output OUTPUT

The .bed file is then read by read.pcadapt: in.pcadapt = read.pcadapt("path/to/OUTPUT.bed", type = "bed")

Then, the in.pcadapt is the starting point of the pcadapt analysis.

This approach works fine in most of the cases, but in some simulated scenarios I am getting the following error: x = pcadapt(in.pcadapt, K = 50, min.maf = 0.05) Error in solve.default(cov, ...) : system is computationally singular: reciprocal condition number = 1.04646e-19

After some googling about the error ("system is computationally...."), it seems that there is a problem with the design matrix, and maybe with multicollinearity.

I first tried to regenerate the .bed files and had no success. Following I generated a .bed file from the .vcf using vcf2bed from the bedr package, which worked (I get a genotype matrix). I tried to export this matrix with write.table but when re-importing pcadapt asks for the associated bim files which do not exist

I then tried to use read.pcadapt on a lfmm file format. I generated a .geno file from the .vcf (vcf2geno), then used geno2lfmm to generate a .lfmm file. The .lfmm file was then the input of read.pcadapt:

in.pcadapt = read.pcadapt("path/to/FILE.lfmm", type = "lfmm")

which gives me the following warning: Warning message: In writeBed(input, is.pcadapt) : You have more individuals than SNPs. Are you sure of the type of your file? (pcadapt/lfmm)

when running pcadapt (as above), I get the same error: x = pcadapt(in.pcadapt, K = 50, min.maf = 0.05) Error in solve.default(cov, ...) : system is computationally singular: reciprocal condition number = 1.82059e-19

Any idea of what might be causing these errors, and how to address them? Suggestions on a better way to generate the .bed, or some sort of fix that can be done to the file?

Thanks

JP

privefl commented 2 years ago
  1. You can't use a small number of variants in pcadapt; it needs many "null" variants to be able to detect the outliers variants that we are interested in.
  2. If you have a lot of correlation between the 100 loci, you may not have 50 independent ones, which means you cannot compute 50 PCs.

Have a look at https://github.com/bcm-uga/pcadapt/issues/56 where I show how to add 50K random variants, which should solve both problems.

PS: K=50 seems large, you will need to rerun pcadapt with a smaller number when you have found an appropriate value for K.

jpfontenelle commented 2 years ago

Hi Florian

I will try. I usually run with k=50 as an "overkill" then choose best K and re run.

Just out of curiosity, adding these random null variants have any impacts on pcadapt inference of outliers? For example, if we add 50k or 10k or 1k, does it change the results?

Cheers

JP

privefl commented 2 years ago
  1. Good strategy

  2. Yes, you should add enough, therefore 50K is good I think.

privefl commented 2 years ago

Any update on this?

jpfontenelle commented 2 years ago

Hi Florian,

I've simulated additional loci (using 1000 instead of 100) and, so far, no more of these errors.

I will keep you posted if I encounter any issues

Thanks!

JP

On Mon, Jun 13, 2022 at 9:00 AM Florian Privé @.***> wrote:

Any update on this?

— Reply to this email directly, view it on GitHub https://github.com/bcm-uga/pcadapt/issues/76#issuecomment-1153884287, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADA4QJJ25QHG6FSM3Z3BGXTVO4WGRANCNFSM5XO726MQ . You are receiving this because you authored the thread.Message ID: @.***>

--

___{: }---'-

João Pedro Fontenelle, PhD @jpf_ishes https://twitter.com/jpf_ishes | jpfontenelle.github.io