EricArcher / strataG

strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure.
25 stars 12 forks source link

nucleotideDivergence() not recognising alleles #40

Closed mariaemilyd closed 3 years ago

mariaemilyd commented 4 years ago

Hi Eric,

Thanks so much for making such a brilliant package!

Something strange is going on with the nucleotideDivergence() function for me and I just can't figure out why it's not working.

I'm using a gtypes file with 122 unique haplotypes, but I get this error message when I try to run the function:

div <- strataG::nucleotideDivergence(mt_haps)

Error in utils::combn(.data$allele, 2) : n < m

Which is from the line "haps <- utils::combn(.data$allele, 2)" in nucleotideDivergence().

When I check my number of alleles:

length(unique(mt_haps@data$allele)) 122

I'm using StrataG v2.4.905.

My full code and datasets are here: https://github.com/mariaemilyd/test_repo

Thanks very much,

Maria

mariaemilyd commented 4 years ago

Also, I've been having issues with calculating pairwise PhiST, when running pairwiseTest(stat = "phist") I get the error message: "Error in statPhist_C(input$loci, input$strata, input$hap.dist) : Not a matrix." I can't find a line with statPhist_C() in pairwiseTest() or overallTest() which is called within pairwiseTest() so it's difficult to troubleshoot. The same function works with your dloop.g data, but I struggle to find any material differences between my gtype object and dloop.g that could explain the error message. The code and files are at the same location.

EricArcher commented 4 years ago

Sorry about the problems with the functions. I've had a few issues pop up recently and I've just started digging into them. One problem I'm having is that there seems to be a change that is causing the C code used through Rcpp to crash on my machine. This code is behind the population structure tests. It is crashing in the tests that were working before when I made the v2.4.905 CRAN submission, so something has changed either in the R version, Rcpp, or with the Mac OS upgrade. I'll work on it this week and will post when I have things fixed.

mariaemilyd commented 4 years ago

Thanks for your reply Eric - much appreciated!

EricArcher commented 4 years ago

Apologies for the delay in looking at this. I've been able to fix the issue with nucleotideDivergence(). It wasn't handling strata that only had a single sample properly when it tried to summarize pairwise distances within a strata. It will now show NA for those strata. I'm still working on population structure issues. I can't get the C code to run on my machine without crashing. It is the same code that worked in previous versions of R. I will have to entirely rewrite those functions which may take another month. I'll update this issue when that is done.

mariaemilyd commented 4 years ago

Thanks so much Eric, no worries! That's great, thank you for taking the time.