"missing values only" warning with data filtered to <0.1 missing values

bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4

https://bcm-uga.github.io/pcadapt

37 stars 10 forks source link

"missing values only" warning with data filtered to <0.1 missing values #65

Closed ewmorr closed 3 years ago

ewmorr commented 3 years ago

Hello,

I'm attempting to run a test dataset of 81 variants and 71 genotypes through pcadapt. These are VCF v4.2 data that have been filtered using plink v1.9 --mind 0.1 --geno 0.1 --recode12 --missing-genotype 9 to produce BED files for loading to pcadapt v4.3.3 in R v3.6.2. The data are from a de novo assembled fungus, and so first six columns of each line of the associated PED files read as sampleName sampleName 0 0 0 -9 (with sampleName assigned to unique codes for each sample).

I get no errors on reading the data, but on attempting to run pcadapt I get the following:

Error: Can't compute SVD.
Are there SNPs or individuals with missing values only?
You should use PLINK for proper data quality control.

I'm new to population genomics analysis, but it seems clear to me that this is not a missing data issue? Is there a file formatting issue here? (I do have .bim and .fam files contained in the same directory as the .bed).

Thanks for any pointers.

Best, Eric

privefl commented 3 years ago

Do you really have a bed file here? You don't seem to have used --make-bed.

ewmorr commented 3 years ago

Thanks for your quick response. Sorry I left that out. The full command was plink --vcf test_dat.vcf --recode12 --mind 0.1 --geno 0.1 --out test_dat --allow-extra-chr --make-bed --missing-genotype 9

privefl commented 3 years ago

I don't think you need --recode12 and --missing-genotype 9 when making bed files. Are you sure you're reading the bed file with pcadapt, and not some text file instead?

ewmorr commented 3 years ago

With regards to --missing-gentoype 9 the pcadapt manual Getting Started page says "missing values should be encoded by a single character (e.g. 9) different from 0, 1 or 2."

Here's how I'm reading the file:

require(pcadapt)
path_to_file <- "test_dat.bed"
filename <- read.pcadapt(path_to_file, type = "bed")

Calling filename in R then gives

1] "test_dat.bed"
attr(,"n")
[1] 16
attr(,"p")
[1] 70
attr(,"class")
[1] "pcadapt_bed"

So looks like pcadapt recognizes a bed file? I would note that I can also read a PED (i.e. a text file), I just get a warning about the PED conversion to BED within pcadapt is deprecated...

Nevertheless, I'm probably doing something silly, so if the answer isn't obvious I can continue to do some digging.

privefl commented 3 years ago

That first sentence refers to when you're using a text file as input.

Yes, it seems that you're reading a bed file indeed. Is it normal that it is so small though?

You can use pcadapt::bed2matrix() to convert it to a matrix and probably see from there what is the problem.

ewmorr commented 3 years ago

Hi again,

Thanks for that function, that set me on the right track. I had set ploidy = 1 in pcadapt but I have diploid variant calls from the pipeline I'm using (despite being a haploid variant caller). Rookie mistake. pcadapt now runs with no SVD error.

Thanks! Eric