MesserLab / SLiM

SLiM is a genetically explicit forward simulation software package for population genetics and evolutionary biology. It is highly flexible, with a built-in scripting language, and has a cross-platform graphical modeling environment called SLiMgui.
https://messerlab.org/slim/
GNU General Public License v3.0
161 stars 33 forks source link

document tree sequence format more #38

Closed petrelharp closed 5 years ago

petrelharp commented 5 years ago

It's come to my attention that we don't explain that the first generation individuals are present in the .trees output file anywhere (that I could find?) in the manual. This, and probably a few more things, should be in a section in "Output methods". I'm happy to write this.

bhaller commented 5 years ago

It gets discussed in some of the recipes, as it comes up, I think; but it is probably not in the reference doc sections, and probably ought to be, yes. It'd be great if you could contribute appropriate text. Don't worry about formatting etc. (the original document is in Pages, which is a Mac app), just send me plain text and I'll work it in. Thanks!

petrelharp commented 5 years ago

It gets discussed in some of the recipes, as it comes up, I think

I thought so too, but couldn't find it.

bhaller commented 5 years ago

Here are some spots:

17.2: "SLiM 3.1 produces .trees files that contain the first ancestral individuals in each new subpopulation created by addSubpop(); these individuals are useful for various purposes, such as recapitation (see section 17.10) and tracing ancestry, so they are provided by SLiM for convenience. However, they are not marked as “remembered”, so they disappear when the tree sequence is simplified; this makes it easy to get rid of them when they are not wanted. Here we do not need them (although they would do no harm), so we simplify them away to demonstrate this typical usage pattern."

17.4: "Note that in this recipe the simplify() of the loaded tree sequence is essential (whereas in section 17.2 it was not); without it, every tree would have a root in the first generation, in one or another original ancestor, and all the tree heights would be the same. The simplify() strips away the original ancestors, giving us trees with roots representing the most recent common ancestors for each tree."

17.5: "The only twist here is the call to sim.treeSeqRememberIndividuals(). Our goal is to trace the ancestry at each position in each individual to either p1 or p2. In point of fact, in SLiM 3.1 and later this call is not strictly necessary, because the original ancestors of each subpopulation created by addSubpop() are kept by SLiM automatically. If we did not simplify() after loading the tree sequence with pyslim, those ancestors would be available for us to trace ancestry back to, allowing us to determine whether a particular genomic region originated in p1 or p2."

17.10: "The first step is to load the .trees file. Note that we specifically do not call simplify() here, because we need the first generation individuals to recapitate from; this is, in fact, precisely why SLiM 3.1 preserves those individuals for us."

But as I said, it could certainly be talked about in the reference section, presumably in the "SLiM additions to the .trees file format" section". Or maybe that section becomes a sub-section, with a side-by-side subsection that discusses the first-generation ancestors and any other tricky issues that ought to be mentioned.

Also, since the above passages were apparently hard to find, maybe there's a way they could be made more findable. Not trying to be snarky, I'm entirely serious, revision suggestions always welcome. :->

bhaller commented 5 years ago

Oh, those are all going to be chapter 16 for you; I've inserted a new chapter 13 in the present revision of the manual...

bhaller commented 5 years ago

Not to beat a dead horse, but also in the doc for treeSeqRememberIndividuals(): "SLiM automatically remembers the individuals that comprise the first generation of any new subpopulation created with addSubpop(), for easy recapitation and other analysis (see section 17.10)."

petrelharp commented 5 years ago

Hm. Maybe I was looking at an old version of the docs. This seems like something that we'd better document in a tutorial for pyslim, anyhow. I'm going to close this.