georgeoshardo / SyMBac

Accurate segmentation of bacterial microscope images using synthetically generated image data.
https://doi.org/10.1186/s12915-022-01453-6
GNU General Public License v2.0
18 stars 9 forks source link

Lineage creation error #47

Open cmalinmayor opened 1 month ago

cmalinmayor commented 1 month ago

Hi @georgeoshardo - during some random runs of our data generation script, we hit this error. I'm still not exactly sure what exp_node stands for - what do you think would cause this to happen? I can confirm that the node is actually not in the graph, based on the IDs that are printed, but I didn't paste it here because it is rather long.

  File "write_video.py", line 91, in get_lineage_graph
    lineage = Lineage(simulation)
  File "/groups/funke/home/malinmayorc/code/SyMBac/SyMBac/lineage.py", line 45, in __init__
    assert exp_node in self.temporal_lineage_graph.nodes, f"Node {exp_node} not in temporal lineage graph nodes {self.temporal_lineage_graph.nodes}"
georgeoshardo commented 1 month ago

Hi Caroline. I know of this error and am planning to fix it in the branch I'm working on. I have made an incorrect assumption when building the lineage tree. A node should have at most two successors and one predecessor, but I am accidentally introducing grandmother and granddaughter connections (sometimes).

You have access to the dataframe with mother mask labels and time points. You might be able to reconstruct the graph properly compared to that very hastily thrown together loop, where the exception is thrown.

georgeoshardo commented 1 month ago

Actually my previous comment is partly wrong, it's not grandmother-granddaughter connections, but rather a connection between a daughter and a mother at the wrong time.

This is just adding every cell to the graph, with the node being (mask label, time), but crucially I add the SyMBac.cell.Cell as a node attribute:

        for cells in self.simulation.cell_timeseries:
            for cell in cells:
                self.temporal_lineage_graph.add_node((cell.mask_label, cell.t), cell = cell)

This is where I build the temporal connections, and is where where things start going wrong, exp_node is referring to the predecessor in time, we create a hypothetical node for a given cell's mask label, but in the previous time-step. If that node exists, we can add an edge between the node at t-1 and t.

        for node in self.temporal_lineage_graph.nodes:
            exp_node = (node[0], node[1]-1)
            if exp_node in self.temporal_lineage_graph.nodes:
                self.temporal_lineage_graph.add_edge(exp_node, node)

Here I'm trying to build the mother daughter connections, Iterating through all the nodes in the graph. I pull the SyMBac.cell.Cell object out of the node's attributes. If that cell has a mother, then I want to create a mother-daughter edge. I assume (incorrectly) that its mother node must be (cell.mother.mask_label, node[1]). I am not sure why I did not choose (cell.mother.mask_label, node[1]-1), but that still wouldn't fully fix the issue I think...

        for node in self.temporal_lineage_graph.nodes:
            cell = self.temporal_lineage_graph.nodes[node]["cell"]
            if cell.mother:
                exp_node = (cell.mother.mask_label, node[1])
                assert exp_node in self.temporal_lineage_graph.nodes
                self.temporal_lineage_graph.add_edge(exp_node, node)

I also discovered that I've broken this code in my current branch, but this shouldn't be an issue because I completely reworked the Cell attributes to be less bad, and make lineage construction easier (I realised my current scheme will not work well for 2D or 3D colonies, or for situations where the mother dies before the daughter due to lysis), but I appreciate you probably want a solution now, therefore if you have any pre-existing code to generate these graphs, I think the Lineage.all_cell_data_df should contain all the information you need?

cmalinmayor commented 1 month ago

Thanks for the info. I was able to re-create the graph that I wanted (I think). Perhaps it might be best to remove the buggy code from this branch in the meantime, while you fix it in the new branch?

georgeoshardo commented 1 month ago

Will do. Could you share how you went about it?

On Mon, 15 Jul 2024, 20:09 Caroline Malin-Mayor, @.***> wrote:

Thanks for the info. I was able to re-create the graph that I wanted (I think). Perhaps it might be best to remove the buggy code from this branch in the meantime, while you fix it in the new branch?

— Reply to this email directly, view it on GitHub https://github.com/georgeoshardo/SyMBac/issues/47#issuecomment-2229198499, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACFKIXZCPO2LGTI763UGTCLZMQM6JAVCNFSM6AAAAABKZSIOACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRZGE4TQNBZHE . You are receiving this because you were mentioned.Message ID: @.***>

cmalinmayor commented 1 month ago

Sure, I essentially created the graph nodes from the generated masks (node_id = (mask_id, time) (https://github.com/cmalinmayor/SyMBac/blob/aaa9dd49c952fc63f2b50812227219bef689a4e7/scripts/write_video.py#L155). Then, I used the Lineage.family_tree_edgelist to make the edges (https://github.com/cmalinmayor/SyMBac/blob/aaa9dd49c952fc63f2b50812227219bef689a4e7/scripts/write_video.py#L60). Specifically, for each node in the graph at time>0, if the same mask_id was present in the previous frame, I linked (mask_id, time-1) -> (mask_id, time). If the same mask_id was not present in the previous frame, I looked in the family_tree_edgelist to find the parent_mask_id, and linked (parent_mask_id, time - 1) -> (mask_id, time). I then remapped all the node_ids (and masks) to unique integers throughout all time points, which is why the code is a bit more complex.

It's still not the most elegant, but generating the nodes in the graph from the masks rather than the simulation cells avoided issues caused by extra cells that were not visible in the mask but were in the simulation. This approach does break if I add lysis, with some parent cells (parent_mask_id, time - 1) not being in the graph. This shouldn't happen because lysed cells shouldn't come back in later frames or have their mask_ids reused, but I didn't have time to debug that yet.