bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
37 stars 10 forks source link

ERROR: Can't compute SVD #48

Closed jrglass6 closed 4 years ago

jrglass6 commented 4 years ago

Hello,

I am getting the "Can't compute SVD. Are there SNPs or individuals with missing values only?" error for one of my SNP datasets after removing individuals and filtering for missing data using VCFtools and PLINK. When I look at the corresponding "imiss" and "lmiss" files, there are no individuals or loci with 100% missing data. I'm able to perform PCA on the same dataset using other programs. Is there a better way to troubleshoot what specifically is flagging this error? I'm using the exact same code to remove individuals from another dataset and pcadapt works fine.

Cmel_FR16.lmiss.txt Cmel_FR16.imiss.txt

privefl commented 4 years ago

The decomposition usually fails when there are variants with no variation. It should be either all missing or very low MAF (but we filter for that by default).

jrglass6 commented 4 years ago

Hello,

Thank you for the quick response. So would the error be resolved if I have a larger MAF filter? Currently I filter at the very beginning in VCFtools and remove indels, only retain biallelic SNPs, and my MAF filter is 0.0001.

I can try doing another MAF filter after removing the individuals.

Cheers,

Jessica

On Mar 27, 2020, at 10:53 AM, Florian Privé notifications@github.com wrote:

The decomposition usually fails when there are variants with no variation. It should be either all missing or very low MAF (but we filter for that by default).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bcm-uga/pcadapt/issues/48#issuecomment-604887446, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOIF7FQY7UGZOT3HUSM5IDRJRSRLANCNFSM4LUZ5NXA.

privefl commented 4 years ago

Function pcadapt() comes with a parameter min.maf with default value of 0.05. Are you using this?

jrglass6 commented 4 years ago

I was not because I was already filtering for min.maf using VCFtools, but I just tried re-running pcadapt with min.maf = 0.05 and it gave the same error (Can't compute SVD).

Looking at the missing sites file (Cmel_FR16.lmiss.txt) there is no indication that there are any variants with all missing data.

jrglass6 commented 4 years ago

The error is somehow related to removing a large number (78) of individuals at once. When I run the same filtering steps without removing all of the individuals (or even some of the individuals) I do not get the error. But, looking at the "Cmel_FR16.imiss" file, I can't tell whether one or more individuals are problematic because none of them have a significant amount of missing data.

privefl commented 4 years ago

Please try running pcadapt:::get_af() on your input.

jrglass6 commented 4 years ago

As "input" you mean the .bed file, correct? It's unable to calculate the allele frequencies and I get the following error. Mine is an R object called "Cmel.pcad2"

Error in pcadapt:::get_af(Cmel.pcad2) : Expecting an external pointer: [type=character].

This is the input: [1] "/path/to/directory/Cmel.pcad.bed" attr(,"n") [1] 80 attr(,"p") [1] 28172 attr(,"class") [1] "pcadapt_bed"

privefl commented 4 years ago

Sorry, you need to use

n <- attr(input, "n")
p <- attr(input, "p")
xptr <- structure(pcadapt:::bedXPtr(input, n, p), n = n, p = p, class = "xptr_bed")
jrglass6 commented 4 years ago

Sorry but I'm having issues installing bigsnpr to get structure() to run. It's running into errors and not installing from CRAN or from GitHub.

I calculated the allele frequencies using VCFtools instead and have attached the file here.

Cmel_FR16.frq.txt

privefl commented 4 years ago

Sorry, not {bigsnpr}, but {pcadapt}.

privefl commented 4 years ago

So, you have variants with no variation indeed. I don't understand why you get the error when using min.maf.

jrglass6 commented 4 years ago

So does pcadapt not work if there are alleles that are fixed?

I also don't understand why I am getting this error, even after filtering for min.maf both in VCFtools and now in pcadapt...

Is it possible that there are some individuals that have unique scaffolds, and for some reason VCFtools is not removing those scaffold (CHROM) names so it looks like there should be data associated with a scaffold but there isn't? I have been investigating this, but looking at the VCF files the scaffold numbers are reduced when I take out the individuals, so I don't think this is the issue.

privefl commented 4 years ago

Do you know how to use the debugger in RStudio? You can use debugonce(pcadapt) and then run it, you will be able to run it line by line and to see the values of everything that is computed.

Otherwise, are you able to share your data?

I see now that your sample size if very small (n = 80), you should try increasing min.maf to 0.1.

jrglass6 commented 4 years ago

I tried increasing min.maf to 0.1 and still no luck :-(

I'm happy to share the .vcf file with you. Can you access it here via GoogleDrive?

Ran debugonce(pcadapt) and it gives the following output on this code: Cmel.pcad3 <- pcadapt(Cmel.pcad2, K = 100, min.maf = 0.1, method = "mahalanobis")

function (input, K = 2, method = "mahalanobis", min.maf = 0.05, ploidy = 2, LD.clumping = NULL, pca.only = FALSE, tol = 1e-04) { if (missing(input)) { appDir <- system.file("shiny-examples/app-pcadapt", package = "pcadapt") if (appDir == "") { stop("Could not find Shiny app in pcadapt.", call. = FALSE) } shiny::runApp(appDir, display.mode = "normal") } else { if (any(grepl("^pcadapt_", class(input)))) { UseMethod("pcadapt") } else { stop("Remember to always use read.pcadapt() before using pcadapt().") } } }

privefl commented 4 years ago

Are you using K = 100?? You have only 80 individuals, you can't get more than 80, and should probably not ask more than 20.

jrglass6 commented 4 years ago

Previously I had several hundred individuals so tried a first-run using a high value of K and didn't change it here.

Then I have been re-running the analysis using an appropriate value of K.

*Update. You are 100% right. I just re-ran with an appropriate value of K (which I thought I had done before but apparently hadn't -- I've been trying all different types of datasets!). Oh my gosh. It makes intuitive sense, I just didn't even think about the K value since I've always had >100 individuals and I assumed the error had to do with missing data. I am so sorry.

privefl commented 4 years ago

Ok, I'll look into it later. Can you provide the bed/bim/fam instead of the vcf?

privefl commented 4 years ago

Good you found the problem.

It is probably my fault as I'm trying to catch an error to make it more explicit, but actually an other error happened and the actual error message should have been more explicit than "Can't compute SVD".