Closed hjt1129 closed 3 years ago
You should be able to go from VCF to bed in one call of PLINK.
You should probably also filter on missing values using at least --mind 0.5 --geno 0.5
(in the same call).
Thank you very much for your reply. We have new question is that: when we use K=i, there has i sigular.values, so what's the meaning of singular.values, is this the percentage of variance for each PC?
This means if we want to see the proportion for first two PCs, it should be singular.values[1]^2+singular.values[2]^2, right?
Yes
You should be able to go from VCF to bed in one call of PLINK.
You should probably also filter on missing values using at least
--mind 0.5 --geno 0.5
(in the same call).
Hello privefl and hjt1129, I have filtered my bedfile using plink with ' --mind 0.01 --geno 0.01', but when I run the code “x <- pcadapt(input = filename, K = 20)”, it still showed “Error: Can't compute SVD. Are there SNPs or individuals with missing values only? You should use PLINK for proper data quality control.” Do you know why that is ?
How many samples/variants have you left?
How many samples/variants have you left?
Thank you for your reply! I have 15 samples with 2,856,704 variants.
You can't get 20 PCs with 15 individuals. Your filtering is too strong.
You can't get 20 PCs with 15 individuals. Your filtering is too strong.
I didn't understand principal component analysis well enough, and I was stupid to think that the number K of principal componentsn could take any value. So is PCs related to samples? What is a reasonable number that I should take?
I guess you should not use more than K=N/10, or somethink like that. But to get sufficient power for pcadapt, we kind of expect that you have at least 200 individuals.
well, privefl, actually I only have 15 samples. Is small samples not suitable for analysis using pcadapt?
I don't think it is.
Hello privefl, I have a question about the final results of outlier file, is it the sequence of SNPs in the corresponding VCF file?
The order in the input files, the .bed file (so the .bim file as well). Please open a new issue for every new (unrelated) question.
The order in the input files, the .bed file (so the .bim file as well). Please open a new issue for every new (unrelated) question.
well,thak you so much
Hi privefl,
I have a similar question to the ones that have been posted. I filtered my data in plink using --maf 0.05, --min 0.5, --geno 0.5, and exported the data in the bed file format. I have 109 individuals as 173485 SNPs after filtering in plink. I am still getting the 'Error: can't compute SVD. Are there SNPs or individuals with missing values only?'. As far as I know the bed file should be good to go, I've not run into this error in pcadapt before. Any advice would be greatly appreciated.
I don't know actually, it does not make sense. Are you sure you're using the new bed file?
Also, you can try to have a look at the genotype matrix with pcadapt::bed2matrix()
.
We are now using pcadapt package to analyze our data, but we meet some problems. The main problem is how to get the genotype file? Our original data is one fasta file with 428 phased sequences of 214 individuals, all the sequences belong to one gene.
We have tried several methods to get vcf format file which can be used for read.pcadapt convertion. One method we used is this R code (https://rdrr.io/github/gehara/Junkyardtools/man/fasta2VCF.html), but when run read.pcadapt function, it goes the error “Error in vcfR::read.vcfR(input, verbose = FALSE) : File: E:/hongwei/hw.vcf does not appear to be a VCF file. First line of file: E:/hongwei/hw.vcf Should begin with: ##fileformat=VCFv In addition: Warning message: In file2other(input, type, match.arg(type.out), match.arg(allele.sep)) : Converter vcf to pcadapt is deprecated. Please use PLINK for conversion to bed (and QC).”
Another method we used is samtools and bcftools, one of 428 phased sequences was used as index, and follow the following seteps:
$ bwa index refref.fa
$ bwa mem -t 12 -M ref.fa sample.fasta >sample.sam
$ samtools sort -@ 12 -o sample.bam sample.sam
$ samtools mpileup -uf ref.fa sample.bam | bcftools call -mv >var.vcf
similar error "Warning message: In file2other(input, type, match.arg(type.out), match.arg(allele.sep)) : Converter vcf to pcadapt is deprecated. Please use PLINK for conversion to bed (and QC)." happens.
So we also tried to use PLINK to convert this vcf file to bed format with following code:
plink --vcf var.vcf --allow-extra-chr --maf 0.05 --recode --out var
plink --file var --maf 0.05 --allow-extra-chr --make-bed --out var
But when we run the code “x <- pcadapt(input = filename, K = 20)”, it shows “Error: Can't compute SVD. Are there SNPs or individuals with missing values only? You should use PLINK for proper data quality control.”
We now think something may be wrong during fasta to vcf format convertion, do you have any suggestions for us? or you have any proper code or package that can share to us? Many thanks.