e-jorsboe / fastNGSadmix

Program for estimating admixture proportions and doing principal component analysis of a single NGS sample
GNU General Public License v3.0
9 stars 4 forks source link

reference panel generation problem #9

Closed bgyuris closed 9 months ago

bgyuris commented 9 months ago

Dear e-jorsboe!

When I try to generate a ref panel I always run to this issue and I couldn't debug it: Loading required package: snpStats Loading required package: survival Loading required package: Matrix Error in if (class(y) == "numeric") { : the condition has length > 1

I have tried it different plink files and my format looks okay. Can you help me out here?

Thank you in advance

e-jorsboe commented 9 months ago

Hi,

I think it is because when the script tries to convert it from the snpStats data format into a matrix, something goes wrong. y<-as(pl$genotypes[indis,],"numeric")

Here y should either be an R vector (if it is only one individual) or an R matrix (if more individuals). Perhaps you can check what y is when running it?

Also what version of R are you running?

bgyuris commented 9 months ago

Hi, thanks for the quick response.

Y is a matrix (with dimension of 2501x1083476, which seems correct). But there is 'NA' for the pop identifier, which I do not understand completely as the fam looks ok. Can it be the issue? The R version is 4.3.2.

Example from the fam: pop1 HG00096 0 0 1 -9 pop1 HG00097 0 0 2 -9 pop1 HG00099 0 0 2 -9 pop1 HG00100 0 0 2 -9

e-jorsboe commented 9 months ago

Yeah that could be an issue could I get to check the fam file (here called "famFile.fam") with this command:

cut -f1 -d" " famFile.fam | sort -n | uniq -c

Also after line 59 in R/plinkToRef.R can you add the following lines to print: print(class(y)) print(dim(y))

bgyuris commented 9 months ago

Got this for the .fam(it was just an example fam, testing for the run): 889 pop1 1615 pop2 Got this for the additional lines: [1] "matrix" "array" [1] 889 1083476 Error in if (class(y) == "numeric") { : the condition has length > 1 Execution halted

e-jorsboe commented 9 months ago

So I think the issue is that when the matrix of genotypes is created from the snpStats object y<-as(pl$genotypes[indis,],"numeric")

Since R 4.0.0 matrices have classes "matrix" "array" so that is why my script failed.

Now it checks if the genotypes are a vector (this will be the behaviour in R previous to 4.0.0 if it is a single individual). And otherwise treats the genotypes as a matrix.

Long story short, pull the updated version and try and see if it has helped.

bgyuris commented 9 months ago

it works!

Thank you!