population name inscrutibly dissappears

petrelharp commented 2 months ago

Over in https://github.com/tskit-dev/pyslim/issues/343 we ran into the following issue. Consider this script:

initialize() {
setSeed(123);
    defineConstant("param_set_id", 1);
    defineConstant("NAnc", 17147);
    defineConstant("mAncNS", 0);
    defineConstant("NAncS", 411);
    defineConstant("NAncN", 209);

    // ----- TIME PARAMETERS -----
    defineConstant("start_splitAnc", 10); 
    defineConstant("today_delay", 58);
    defineConstant("Split_Sagit_delay", 741);
    defineConstant("Split_Sagit_gen", start_splitAnc + Split_Sagit_delay - today_delay);
    defineConstant("today_gen", Split_Sagit_gen + today_delay);

    // ----- TREE-SEQUENCE RECORDING ----- 
    initializeTreeSeq(); 

    // set the overall mutation rate map
    initializeMutationRate(0);
    initializeMutationType("m1", 0.5, "f", 0.0); // Neutral mutations
    initializeGenomicElementType("g1", m1, 1.0);
    initializeGenomicElement(g1, 0, 1000000);

    //constant recombination along the chromosome:
    initializeRecombinationRate((5.56*1e-8)*10);
}

1 early() {
    // Create the ancestral population
    sim.addSubpop("p1", NAnc);
       // sim.treeSeqOutput("unused.trees"); // <------ (*)
}

start_splitAnc early() {
    sim.addSubpopSplit("p2", NAncS, p1);
}

start_splitAnc+1 early() {
    //remove ancestral pop
    p1.setSubpopulationSize(0);
}

today_gen late() {
    sim.treeSeqOutput("orig_mod1.trees");
}

If we run this then (a) all the first generation individuals are in p1 and are correctly retained in the tree sequence, but (b) population p1 doesn't have its name still:

$ tskit populations orig_mod1.trees 
id  metadata
0   None
1   None
2   {'migration_records': [], 'name': 'p2', 'sex_ratio': 0.0, 'slim_id': 2}

However, if you output the tree sequence earlier, when p1 still existed, by uncommenting the line marked (*), then you get:

$ tskit populations orig_mod2.trees 
id  metadata
0   None
1   {'name': 'p1'}
2   {'migration_records': [], 'name': 'p2', 'sex_ratio': 0.0, 'slim_id': 2}

Everything else is identical. So, there's some bug in the population-name-retaining code.

This is not serious, since users can use indices instead of names when referring to the populations.

bhaller commented 2 months ago

surprised this hasn't surfaced before now...

bhaller commented 1 month ago

Note to self: if the tskit command-line tool is not there, use python -m tskit populations <xxx.trees>.

bhaller commented 1 month ago

OK, reproduces for me. The relevant code is in Species::WritePopulationTable() at https://github.com/MesserLab/SLiM/blob/a5531359475cbc9b4c64581470b066aa8df73821/core/species.cpp#L5401. The code is really quite straightforward here; we just loop over the currently existing subpops with for (auto subpop_iter : population_.subpops_) and write them out. Whenever there is a gap in the numbering, we check the old population table and carry over an entry from there, if there is one. If that entry has non-SLiM metadata we carry it over exactly; if it has SLiM metadata we carry over only the name field and ditch the rest, which converts it to non-SLiM metadata (discussed in #317 apparently, although I haven't gone and looked at that). Working in the debugger, what I see is that the first time this model auto-simplifies is in tick 20. At that point p1 already doesn't exist, and there's no entry for it in the pop table; my debugging logs (not in GitHub) print out:

Species::WritePopulationTable() called in cycle 20
original pop table last subpop id == -1, last_subpop_id == 2
   writing a non-extant subpop with metadata length 4 at index 0
   writing a non-extant subpop with metadata length 4 at index 1
   writing an extant subpop with metadata length 64 at index 2

By the way, last_subpop_id being 2 comes from scanning through the nodes; references to p2 are seen, so the code knows it needs to write out entries for slots up to 2. But the pop table has no entries in it already; no entry for p1 or p2 is there. It writes out empty placeholder entries (metadata "null") at 0 and 1, and writes p2's metadata at 2.

Subsequent calls to this method work with the previously written table, because the pop table gets updated each time simplify happens (a bit surprising, since it's a side effect, but it seems fine). So those subsequent calls look like this, from my debugging logs:

Species::WritePopulationTable() called in cycle 37
original pop table last subpop id == 2, last_subpop_id == 2
   carrying over non-SLiM metadata (complete, length 4) at index 0
   writing a non-extant subpop with metadata length 4 at index 0
   carrying over non-SLiM metadata (complete, length 4) at index 1
   writing a non-extant subpop with metadata length 4 at index 1
   writing an extant subpop with metadata length 64 at index 2

The previously written empty rows are carried over, with non-SLiM metadata of length 4 ("null"), and p2 is written afresh, writing over the previous p2 entry.

So, the problem seems to be that p1 was not written to the population table before it ceased to exist. This perhaps doesn't bite people too often because usually a subpop exists for long enough for auto-simplification to "snap" it before it disappears. For example, if I add this to the model:

2 early() {
    // snap p1
    sim.treeSeqSimplify();
}

then my debug output changes:

Species::WritePopulationTable() called in cycle 2
original pop table last subpop id == -1, last_subpop_id == 1
   writing a non-extant subpop with metadata length 4 at index 0
   writing an extant subpop with metadata length 64 at index 1
Species::WritePopulationTable() called in cycle 21
original pop table last subpop id == 1, last_subpop_id == 2
   carrying over non-SLiM metadata (complete, length 4) at index 0
   writing a non-extant subpop with metadata length 4 at index 0
   carrying over SLiM metadata (name only, length 13) at index 1
   writing a non-extant subpop with metadata length 13 at index 1
   writing an extant subpop with metadata length 64 at index 2
Species::WritePopulationTable() called in cycle 38
original pop table last subpop id == 2, last_subpop_id == 2
   carrying over non-SLiM metadata (complete, length 4) at index 0
   writing a non-extant subpop with metadata length 4 at index 0
   carrying over non-SLiM metadata (complete, length 13) at index 1
   writing a non-extant subpop with metadata length 13 at index 1
   writing an extant subpop with metadata length 64 at index 2
...

Here p1 got written in tick 2, carried over (name only) in tick 21, and then carried over (non-SLiM metadata) in tick 38. This produces the correct result:

bhaller@glass% python -m tskit populations /Users/bhaller/Desktop/orig_mod1.trees 
id  metadata
0   None
1   {'name': 'p1'}
2   {'migration_records': [], 'name': 'p2', 'sex_ratio': 0.0, 'slim_id': 2}

Here p2 has full SLiM metadata because it is extant, p1 is represented by non-SLiM metadata as just a name because it is not extant, and p0 is an empty slot with metadata "null" as a placeholder. This is how it's supposed to work.

So there are two possible fixes that I see. (1) We could try to add an entry in the pop table at the moment a subpop is created, to record that it exists; then it will get carried over even if it goes extinct before the next simplify/save. There is no place in the code where we currently attempt to do that, as far as I can tell. (2) We actually know which subpop names have been used in the past; we keep track of them to prevent the user from making a new p1 after the old p1 has been removed, because I guess that breaks the rules. We keep them in subpop_ids_ and subpop_names_ in Species (https://github.com/MesserLab/SLiM/blob/a5531359475cbc9b4c64581470b066aa8df73821/core/species.h#L380). So we could use that information to write out entries with names for all subpops we have seen in the past.

I think either would work, but we're already keeping the information necessary for (2), and it seems quite a bit simpler to do. Basically I'd extend the existing logic, which for a given slot says:

(1) is there an extant subpop? if so, write it (2) is there an entry in the old pop table with non-SLiM metadata? if so, carry it over intact (3) is there an entry in the old pop table with SLiM metadata? if so, carry it over as a name only (4) create an empty placeholder with "null" metadata

The new logic would be:

(1) is there an extant subpop? if so, write it (2) is there an entry in the old pop table with non-SLiM metadata? if so, carry it over intact (3) is there an entry in the old pop table with SLiM metadata? if so, carry it over as a name only (3.5) is this slot id listed in the species subpop_ids_ list as having ever been used? if so, make an entry with the corresponding name (4) create an empty placeholder with "null" metadata

I have a couple of questions about this:

Is the name that gets entered in the subpop table supposed to simply be "p1", "p2", etc.? Or if the user has assigned a value to the name property of subpop, is it supposed to use that? It looks like it uses the name property; so the information kept about previously used ids/names will need to be made a bit smarter than it is now, so we know that name "foobar" was used for index 1. This is a little bit hairy, because the user could create p1 (name "p1" for slot 1 has been used), then change the name of p1 to "foo", and then later change it to "bar". It should be written out as name "bar" I guess, so the data structure needs to remember that "p1", "foo", and "bar" have all been used, but that "bar" is the final name for slot 1. Doable, just a little annoying. :->
If a subpop has gone extinct and is not referenced by any node, should we still write out a named entry for it? I think yes, just for consistency/predictability, which means the scan through nodes to find used indices (to get last_subpop_id) needs to also scan through the list of used ids, to guarantee that they will receive entries.

I think I know how to do all this. What I need from you, @petrelharp, is (a) a sign-off from you that you've read all this verbiage, looked at the old policy documents you've written in the past to check what our intentions were, etc., and you think what I propose is definitely the correct solution, and (b) before I start this work, a PR from you that adds a test or tests, exercising all the new paths that we've thought of here that the current tests don't exercise. Those tests will presumably be red when you make the PR; me fixing the bugs here should make the tests go green.

Does that sound good? I think this is easier than I expected it to be. Writing this comment is probably the hardest part of it – whew!

petrelharp commented 1 month ago

That sounds good! A minor note: recall that our philosophy on this was that "what's in the tree sequence reflects the state at the time it's written out (or remembered)" - for instance, an individual's age reflects that, not like anything else. So we could actually argue this isn't a bug.

I'll pop in that test, though.

bhaller commented 1 month ago

A minor note: recall that our philosophy on this was that "what's in the tree sequence reflects the state at the time it's written out (or remembered)" - for instance, an individual's age reflects that, not like anything else. So we could actually argue this isn't a bug.

Yeah, perhaps; I don't like the way that the behavior depends on whether auto-simplification happens to occur before p1 has been removed, though. So I do want to fix this. Thanks!

petrelharp commented 1 month ago

See #459 for tests.

bhaller commented 1 month ago

OK, I think I have fixed this; we'll see if your new tests go green. The original test model now correctly writes out p1 with its name of p1, and if I set its name to something else, that name, instead, is preserved. So I think it's all working properly. The diffs are a bit messy because I added a new debugging method to Species to help me check whether my changes were correct, and I decided to leave that stub in because it will be useful in future. (I'm surprised I hadn't done this before!)

bhaller commented 1 month ago

Hmm, tests are failing, here for example: https://github.com/MesserLab/SLiM/actions/runs/10023371340/job/27704201622. @petrelharp I think I fixed the problem, so I'm not sure what's going on; maybe your tests are flawed? (Looks like there is a JSON problem?) Can you check out the situation and report back here? Thanks!

petrelharp commented 1 month ago

Well, when I run the new test script:

slim test_recipes/test_____pop_names_pX.slim

and then look at the result, I get that same error:

$ tskit populations 20/test_output.trees
id  metadata
0   None
Traceback (most recent call last):
  File "/home/peter/projects/tskit-dev/treestuff_venv/bin/tskit", line 8, in <module>
    sys.exit(tskit_main())
             ^^^^^^^^^^^^
  File "/home/peter/projects/tskit-dev/treestuff_venv/lib/python3.11/site-packages/tskit/cli.py", line 289, in tskit_main
    args.runner(args)
  File "/home/peter/projects/tskit-dev/treestuff_venv/lib/python3.11/site-packages/tskit/cli.py", line 111, in run_populations
    tree_sequence.dump_text(populations=sys.stdout)
  File "/home/peter/projects/tskit-dev/treestuff_venv/lib/python3.11/site-packages/tskit/trees.py", line 4244, in dump_text
    text_formats.dump_text(
  File "/home/peter/projects/tskit-dev/treestuff_venv/lib/python3.11/site-packages/tskit/text_formats.py", line 396, in dump_text
    metadata = text_metadata(base64_metadata, encoding, population)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/peter/projects/tskit-dev/treestuff_venv/lib/python3.11/site-packages/tskit/text_formats.py", line 445, in text_metadata
    metadata = node.metadata
               ^^^^^^^^^^^^^
  File "/home/peter/projects/tskit-dev/treestuff_venv/lib/python3.11/site-packages/tskit/metadata.py", line 757, in __get__
    row, "_metadata", row._metadata_decoder(row._metadata)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/peter/projects/tskit-dev/treestuff_venv/lib/python3.11/site-packages/tskit/metadata.py", line 168, in decode
    result = json.loads(encoded.decode())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Removing that table's metadata scheme (in python) and printing the metadata, I get:

Population(id=0, metadata=b'null')
Population(id=1, metadata=b'p1')
Population(id=2, metadata=b'null')
Population(id=3, metadata=b'{"migration_records":[],"name":"p3","sex_ratio":0.0,"slim_id":3}')

So - looks like you set the metadata for that population to be "p1" not "{"name":"p1"}"?

bhaller commented 1 month ago

Oh, hmm, I see. OK, I can fix that, thanks.

bhaller commented 1 month ago

Looks like it is passing CI now, so I think we're good. Thanks for your help!

MesserLab / SLiM

population name inscrutibly dissappears #447