YosefLab / Cassiopeia

A Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction
https://cassiopeia-lineage.readthedocs.io/en/latest/
MIT License
77 stars 24 forks source link

Add concrete class TumorWithAFitSubclone that implements a TreeSimulator #74

Closed sprillo closed 3 years ago

sprillo commented 3 years ago

I also added bulk get_times and set_times methods to the CassiopeiaTree. I still need to add tests in CassiopeiaTree for these two methods, but before let's make sure that you like these methods and their semantics too.

sprillo commented 3 years ago

@mattjones315 regarding point #2:

(2) On the topic of simplicity, I understand why it's nice to have a deterministic simulator and I don't want to tell you what is and is not useful for your work. However, I don't know how generally useful this would be -- in my mind, simulators should have some randomness built into them so you can get an idea of a what a distribution of test cases or examples might look like. One way we can build randomness into this is to specify a distribution of waiting times for the fit subclone rather than an explicit length. We could actually support both by taking in a function (HELLO COMPOSITIONAL DESIGN!) for the waiting times.

I really like the idea of using composition to inject the waiting times and get different behaviors. My concern is: how would you decide when the fit subclone starts expanding? Right now I just specify the generation number; since all branch lengths are the same, there is a very clear notion of "generation 6". What fo you propose in general? If we find a simple way to decide when the subclone starts to expand, I would be happy implementing the more general version you propose.

On the topic of randomness, I think it can be useful later on to perform more thorough quantitative analysis. However, I think any method development should start with a deterministic playground that is as simple as possible, and later on move to more complicated settings. I think this TreeSimulator is as simple as possible!

mattjones315 commented 3 years ago

@mattjones315 regarding point #2:

(2) On the topic of simplicity, I understand why it's nice to have a deterministic simulator and I don't want to tell you what is and is not useful for your work. However, I don't know how generally useful this would be -- in my mind, simulators should have some randomness built into them so you can get an idea of a what a distribution of test cases or examples might look like. One way we can build randomness into this is to specify a distribution of waiting times for the fit subclone rather than an explicit length. We could actually support both by taking in a function (HELLO COMPOSITIONAL DESIGN!) for the waiting times.

I really like the idea of using composition to inject the waiting times and get different behaviors. My concern is: how would you decide when the fit subclone starts expanding? Right now I just specify the generation number; since all branch lengths are the same, there is a very clear notion of "generation 6". What fo you propose in general? If we find a simple way to decide when the subclone starts to expand, I would be happy implementing the more general version you propose.

On the topic of randomness, I think it can be useful later on to perform more thorough quantitative analysis. However, I think any method development should start with a deterministic playground that is as simple as possible, and later on move to more complicated settings. I think this TreeSimulator is as simple as possible!

Ah, these are interesting points! Definitely, it's nice to have a deterministic framework to have you understand the problem. But when we do method development we like to have some noise in our examples so we can get a feel for the robustness of the method.

To your point -- I think you can maintain a generation counter and still initiate the fitness at the nth generation. Even though branch lengths are not uniform, you can still count generations as the # of cell divisions up to a certain node. I think it'd be helpful to have some randomness in the edge lengths (or at least support randomness).

sprillo commented 3 years ago

@mattjones315 regarding point #2:

(2) On the topic of simplicity, I understand why it's nice to have a deterministic simulator and I don't want to tell you what is and is not useful for your work. However, I don't know how generally useful this would be -- in my mind, simulators should have some randomness built into them so you can get an idea of a what a distribution of test cases or examples might look like. One way we can build randomness into this is to specify a distribution of waiting times for the fit subclone rather than an explicit length. We could actually support both by taking in a function (HELLO COMPOSITIONAL DESIGN!) for the waiting times.

I really like the idea of using composition to inject the waiting times and get different behaviors. My concern is: how would you decide when the fit subclone starts expanding? Right now I just specify the generation number; since all branch lengths are the same, there is a very clear notion of "generation 6". What fo you propose in general? If we find a simple way to decide when the subclone starts to expand, I would be happy implementing the more general version you propose. On the topic of randomness, I think it can be useful later on to perform more thorough quantitative analysis. However, I think any method development should start with a deterministic playground that is as simple as possible, and later on move to more complicated settings. I think this TreeSimulator is as simple as possible!

Ah, these are interesting points! Definitely, it's nice to have a deterministic framework to have you understand the problem. But when we do method development we like to have some noise in our examples so we can get a feel for the robustness of the method.

To your point -- I think you can maintain a generation counter and still initiate the fitness at the nth generation. Even though branch lengths are not uniform, you can still count generations as the # of cell divisions up to a certain node. I think it'd be helpful to have some randomness in the edge lengths (or at least support randomness).

Sounds good Matt! I will think about the best way to implement this.