MesserLab / SLiM

SLiM is a genetically explicit forward simulation software package for population genetics and evolutionary biology. It is highly flexible, with a built-in scripting language, and has a cross-platform graphical modeling environment called SLiMgui.
https://messerlab.org/slim/
GNU General Public License v3.0
160 stars 30 forks source link

add individuals= option to treeSeqOutput() to output a sample #448

Open bhaller opened 2 months ago

bhaller commented 2 months ago

Seems like a lot of people want to be able to do this; it's an FAQ on slim-discuss. It can be done with killIndividuals() or similar techniques, but it'd be more graceful to have a way to just do it directly in the treeSeqOutput() call. No reason we can't do this easily, right @petrelharp?

petrelharp commented 2 months ago

Well, it's not totally straightforward: recall all the bookkeeping around which individuals are remembered, retained, etcetera. It would be straightforward to just simplify the output tables down to the requested individuals. However, recall that currently we make an extra copy of the tables for output purposes, but we'd like to not do that. I'm not very enthusiastic to do this because for most purposes it's better to output everyone and get multiple replicates out of the one output. Maybe we need an example of doing that somewhere?

bhaller commented 2 months ago

Well, it's not totally straightforward: recall all the bookkeeping around which individuals are remembered, retained, etcetera.

Indeed, I recall that.

It would be straightforward to just simplify the output tables down to the requested individuals. However, recall that currently we make an extra copy of the tables for output purposes, but we'd like to not do that.

Yes; well, we'd like to avoid making a copy, but haven't succeeded in that (I don't think we're even close to that goal, are we?); and even if we did succeed in that, we could still keep the current code path for use when a subset of individuals is specified. Doesn't seem like a major obstacle.

I'm not very enthusiastic to do this because for most purposes it's better to output everyone and get multiple replicates out of the one output. Maybe we need an example of doing that somewhere?

An example of that does seem like a good idea. It's a new idea to me; I've never seen it done, and hadn't heard anyone even mention it until you mentioned it in the recent slim-discuss thread. I'm a bit suspicious of it because of possible (likely?) correlations between the different replicates from a single output; pseudoreplication issues seem possible, and hard to rule out. But if you think it's a good technique, it should certainly be demonstrated somewhere. But it seems kind of orthogonal to the issue at hand.

To me, the basic fact is that people want to be able to do this, and they're hacking it in by the various techniques described in the slim-discuss thread; given that reality, it'd be better to provide them with a clean API that does what they want to do.