bodkan / slendr

Population genetic simulations in R 🌍
https://bodkan.net/slendr
Other
55 stars 6 forks source link

problem with ts_eigenstrat #159

Closed Frasella closed 1 month ago

Frasella commented 2 months ago

Hi,

I'm having trouble using the ts_eigenstrat function. I've saved my tree sequence file after adding mutations, and now I want to double-check some results using the r package admixr, so I'm trying to use ts_eigenstrat to convert to EIGENSTRAT file format, but I get the following error:

# convert to eigenstrat
snp <- ts_eigenstrat(ts_P3_90_1, prefix = "bottleneck_P3_90_1")
Error: Attempting to extract genotypes from a tree sequence which has not been mutated

When I check my tree sequence file after loading it, everything seems to be in order:

# LOAD TS FILE
 ts_P3_90_1 <- ts_load("bottleneck_P3_90_1.trees", model_P3)
ts_P3_90_1
╔═══════════════════════════╗
║TreeSequence               ║
╠═══════════════╤═══════════╣
║Trees          │     285134║
╟───────────────┼───────────╢
║Sequence Length│  100000000║
╟───────────────┼───────────╢
║Time Units     │generations║
╟───────────────┼───────────╢
║Sample Nodes   │        160║
╟───────────────┼───────────╢
║Total Size     │   63.3 MiB║
╚═══════════════╧═══════════╝
╔═══════════╤═══════╤═════════╤════════════╗
║Table      │Rows   │Size     │Has Metadata║
╠═══════════╪═══════╪═════════╪════════════╣
║Edges      │1054325│ 32.2 MiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Individuals│     80│  2.2 KiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Migrations │      0│  8 Bytes│          No║
╟───────────┼───────┼─────────┼────────────╢
║Mutations  │ 314721│ 11.1 MiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Nodes      │ 166810│  4.5 MiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Populations│      4│341 Bytes│         Yes║
╟───────────┼───────┼─────────┼────────────╢
║Provenances│      2│  3.2 KiB│          No║
╟───────────┼───────┼─────────┼────────────╢
║Sites      │ 314184│  7.5 MiB│          No║
╚═══════════╧═══════╧═════════╧════════════╝

Has anyone experience something similar when trying to use the function?

Thanks!

bodkan commented 2 months ago

Hello @Frasella, thanks so much for checking out slendr!

I'm sorry you ran into this issue. I have recognized it immediately after reading the description of your problem. In fact, I ran into it (and fixed it) a few months ago for ts_genotypes(). Unfortunately, it didn't occur to me that ts_eigenstrat() will obviously have the same problem.

This is not a serious error, it's just that slendr doesn't correctly set a mutated/not-mutated flag upon reading a tree sequence with ts_load(), so ts_eigenstrat() thinks there are no mutations on the tree sequence even though they are there (as you verified).

I have just implemented a fix for this, and it's now on the main slendr development branch. You can install it and make sure it works by running devtools::install_github("bodkan/slendr"). Alternative "solution" is to not add mutations right after simulation (and just before saving the tree sequence) but only run ts_mutate() after loading the (unmutated) tree sequence. A bit of a hack but will get you around this issue.

This fix (along with many other updates) will be published in a big v1.0 release I've been planning for a few months, and will then be available for everyone on CRAN. Unfortunately, in the last few months I have temporarily transitioned from being mostly a programmer to a full-time scientist, which has been frustrating for many reasons, one of them being a slower development of slendr. :( But I hope to pick up the pace again soon and push the big update out!

Frasella commented 2 months ago

Hi @bodkan!

Thank you for your quick response despite the busy schedule! I'll go ahead and install the latest version from the development branch and check if that resolves the issue. I'm looking forward to the v1.0 release on CRAN :)

Thanks again for your help!

bodkan commented 2 months ago

Awesome, and definitely do let me know if the fix works for you or if there are other issues or if I can close this.

Also, I happen to be the developer of admixr too, so I'm super happy you're working with both of them. This is exactly the reason why I built the complementary non-spatial/non-SLiM functionality into slendr. If you'd be comfortable letting me know in private over email what are you trying to do with both of the packages, I'd super appreciate it! mp@bodkan.net (but no pressure whatsoever, I'm asking out of curiosity)

You might also want to keep an eye on another tool I've been sloooowly working on called demografr: https://github.com/bodkan/demografr. It's purpose is to leverage slendr/msprime/SLiM as an engine for simulation-based inference (think ABC, parameter grid search, genetic algorithms, etc.). Might come in handy if you're doing simulations of f-statistics and such.

bodkan commented 1 month ago

I am optimistically assuming that the problem has been fixed and therefore closing this issue. :)