Closed jpfontenelle closed 2 years ago
Have a look at https://github.com/bcm-uga/pcadapt/issues/56 where I show how to add 50K random variants, which should solve both problems.
PS: K=50 seems large, you will need to rerun pcadapt with a smaller number when you have found an appropriate value for K.
Hi Florian
I will try. I usually run with k=50 as an "overkill" then choose best K and re run.
Just out of curiosity, adding these random null variants have any impacts on pcadapt inference of outliers? For example, if we add 50k or 10k or 1k, does it change the results?
Cheers
JP
Good strategy
Yes, you should add enough, therefore 50K is good I think.
Any update on this?
Hi Florian,
I've simulated additional loci (using 1000 instead of 100) and, so far, no more of these errors.
I will keep you posted if I encounter any issues
Thanks!
JP
On Mon, Jun 13, 2022 at 9:00 AM Florian Privé @.***> wrote:
Any update on this?
— Reply to this email directly, view it on GitHub https://github.com/bcm-uga/pcadapt/issues/76#issuecomment-1153884287, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADA4QJJ25QHG6FSM3Z3BGXTVO4WGRANCNFSM5XO726MQ . You are receiving this because you authored the thread.Message ID: @.***>
--
___{: }---'-
João Pedro Fontenelle, PhD @jpf_ishes https://twitter.com/jpf_ishes | jpfontenelle.github.io
Hello Florian and all.
I am generating vcf files from simulated populations with 100 loci and 2 alleles per loci.
These vcf files are then used to generate .bed and related files using plink1.9 as follows: plink1.9 --vcf INPUT.vcf --make-bed --recode --output OUTPUT
The .bed file is then read by read.pcadapt: in.pcadapt = read.pcadapt("path/to/OUTPUT.bed", type = "bed")
Then, the in.pcadapt is the starting point of the pcadapt analysis.
This approach works fine in most of the cases, but in some simulated scenarios I am getting the following error: x = pcadapt(in.pcadapt, K = 50, min.maf = 0.05) Error in solve.default(cov, ...) : system is computationally singular: reciprocal condition number = 1.04646e-19
After some googling about the error ("system is computationally...."), it seems that there is a problem with the design matrix, and maybe with multicollinearity.
I first tried to regenerate the .bed files and had no success. Following I generated a .bed file from the .vcf using vcf2bed from the bedr package, which worked (I get a genotype matrix). I tried to export this matrix with write.table but when re-importing pcadapt asks for the associated bim files which do not exist
I then tried to use read.pcadapt on a lfmm file format. I generated a .geno file from the .vcf (vcf2geno), then used geno2lfmm to generate a .lfmm file. The .lfmm file was then the input of read.pcadapt:
in.pcadapt = read.pcadapt("path/to/FILE.lfmm", type = "lfmm")
which gives me the following warning: Warning message: In writeBed(input, is.pcadapt) : You have more individuals than SNPs. Are you sure of the type of your file? (pcadapt/lfmm)
when running pcadapt (as above), I get the same error: x = pcadapt(in.pcadapt, K = 50, min.maf = 0.05) Error in solve.default(cov, ...) : system is computationally singular: reciprocal condition number = 1.82059e-19
Any idea of what might be causing these errors, and how to address them? Suggestions on a better way to generate the .bed, or some sort of fix that can be done to the file?
Thanks
JP