Closed UnHumbleBen closed 2 years ago
@dave-doty For creating a 5' extension, should the extension chain method only take effect when a domain is added, similar to how cross works? Or should it just add an extension immediately?
For creating a 5' extension, should the extension chain method only take effect when a domain is added, similar to how cross works? Or should it just add an extension immediately?
It should be like move
or to
and make the extension immediately; in particular, if it is the last substrand, it would be created by having the extension chained method be the last method called, so it would have to create it immediately. (But there's no reason not to also create it immediately if it is the first chained method call.) This should be legit:
design.draw_strand(0,0).move(10).cross(1).move(-5).extension(6)
On the other hand, if it is the first substrand, it's not clear what's the best approach. draw_strand
takes a helix and offset as input. It would be awkward if those arguments refer to something other than the first substrand on the strand. Any ideas for how to do this in a clean, readable way that won't be confusing? (And that preserves the behavior we currently have if there are no extensions?)
Any ideas for how to do this in a clean, readable way that won't be confusing? (And that preserves the behavior we currently have if there are no extensions?)
After thinking about this for a second, I think the best is to have an optional argument extension_5p_length
in draw_strand
. To create a 5' extension of length 6, followed by a domain on helix 0 from offset 0 to 9, you'd do this:
design.draw_strand(0, 0, extension_5p_length=6).move(10)
Then maybe the name of the extension
chained method should be changed to extension_3p
to emphasize that it can only be used to create a 3' extension.
Any ideas for how to do this in a clean, readable way that won't be confusing? (And that preserves the behavior we currently have if there are no extensions?)
After thinking about this for a second, I think the best is to have an optional argument
extension_5p_length
indraw_strand
. To create a 5' extension of length 6, followed by a domain on helix 0 from offset 0 to 9, you'd do this:design.draw_strand(0, 0, extension_5p_length=6).move(10)
Then maybe the name of the
extension
chained method should be changed toextension_3p
to emphasize that it can only be used to create a 3' extension.
I think this is a good idea and I'll proceed with implementing this way.
The only thing I'm unsure about is how to set a default value for the 5' extension relative_offset. Suppose the following snippet happens:
design.draw_strand(0, 0, extension_5p_length=6)
Sure, we can create an Extension
object, but what relative_offset should it have? If it were a 3' extension, this problem would be simple, either (1, -1) for forward strand and (-1, 1) for reverse strand. But for the 5' extension, we don't know whether the domain will be created in the forward or reverse direction, so we can't assign a default relative_offset. I suppose we could make the relative_offset field optional and have the empty value denote that the default placement is desired. What do you think @dave-doty?
Edit: Disregard the question above, I just realize that we have always been using the absence of a value to indicate default so there's no need to explicitly figure out what the offset is and just assign None to it and make relative_offset an Optional field
Edit: Disregard the question above, I just realize that we have always been using the absence of a value to indicate default so there's no need to explicitly figure out what the offset is and just assign None to it and make relative_offset an Optional field
I'm not sure about this, if you are talking about making the field Extension.relative_offset
be of type Optional[Tuple[float, float]]
.
I don't like allowing a value to be None
(i.e., making its type Optional[Something]
) unless there's really some sense in which in can lack a value. (For instance, a Strand can lack a name, in which case there's no string it makes sense to assign to Strand.name
; it should be assigned None
.)
In this case, every Extension should have a relative_offset
. The default value is a concrete pair of floats, and every other relative_offset
value one could imagine is also a pair of floats, so it doesn't make sense to allow the value to be None
.
For internal classes that only we use, I'm happy to break this rule and just be very careful with our own code. But since Extension
will be part of the public API, I prefer to give the guarantee that every relative_offset
field has a non-None
value of type Tuple[float, float]
, which will help avoid None
-reference errors.
The problem you are describing is that in this situation, at the time draw_strand
is called, we don't yet know which of the default values to use for relative_offset
, since it will depend on the domain yet to be created. I think a way to handle this is to delay creating the extension until the first Domain
is created. The StrandBuilder
object could store a extension_5p_length
field (which can be None
to indicate that it was not specified in draw_strand
), and if non-None
, the first time move
or to
is called, this field is checked and an extension is created if need be. (And then the field is reset to None
so that subsequent calls to move
/to
do not create extensions.) Also update_to
will have to be handled; it could point the domain in the opposite direction as the first to
call.
An alternative is to assume the first domain will be a forward domain and set Extension.relative_offset
accordingly, and then to check if to
or move
(or update_to
) is called to create a reverse domain, and then if there is an extension, to fix its relative_offset
value.
Does that seem like a good approach? A little more complex to implement StrandBuilder
, but then it simplifies the public API for Extension
.
BTW I'm starting to think from this discussion that it would be simpler to do what you originally suggesteed, and have relative_offset
be relative to the "rotation frame" of the adjacent domain. I still think it would be confusing for it to be an offset, i.e., a (z,y) coordinate. But perhaps if it were in polar coordinates, i.e., an angle in degrees and a length in nanometers, that would be simpler for a user to think about.
In this case, 0 degrees would mean the extension lies next to the domain on the same helix. (if next to a forward domain for example, to the left of the domain if a 5' extension and to the right of the domain if a 3' extension) 90 degrees would point away from the helix, and 180 degrees would make the extension overlap the adjacent domain.
In this case I think it would be easier to read if it were two fields of type float
, one called length
and the other called angle
. Then we could change the current length
field to be called num_bases
to differentiate it.
What do you think about that? The default values could be angle
= 45 degrees and length
= 1 nm.
This has the bonus that we would not need to worry about doing those transformations when moving/copying strands in the web interface; it would be automatic. But we would have to be careful rendering the extension, though hopefully appropriate SVG transform
's would make this easy. (i.e., reflect horizontally if it's a 3' extension, and then rotate 180 degrees if the adjacent domain is reverse).
It would also require some clever calculating when drag-moving the end of the extension, since the (z,y) relative coordinates would need to be translated to the "rotated" polar coordinates.
I now agree that the type of relative_offset should be not Optional. I confused the JSON representation with the actual Python object. I mistakenly thought that omitting a field in the JSON translates to a None field in the Python object. I now know that is not the case.
I like the idea of using polar coordinates (length, angle) compared to screen coordinates (z,y). As for whether the coordinate should be relative to a fixed frame or rotation frame of the adjacent domain, I'm not sure. From users I suspect it would be easier to think in terms of a fixed coordinate. At the same time, perhaps most users won't need to think about this field and stick with the default or do modifications on the web application. In this case, I think using the "rotation frame" of the adjacent domain is acceptable because web app users and most Python users won't have to worry about this field, and only a few Python users will need to learn the coordinate system. Whether we choose a fixed frame or a relative frame, I think I would prefer to make the field it's own class, like ExtensionRelativeOffset
with fields like ExtensionRelativeOffset.length
and ExtensionRelativeOffset.angle
to improve readability and we can include details about the coordinate system in the class docstring.
At the same time, perhaps most users won't need to think about this field and stick with the default or do modifications on the web application.
Actually, the more I think about this, the more I think that Python users who want to set this position themselves would have an easier time if it's relative to the rotation frame, given how more figures such as this tend to look in papers. (The defaults I'm imagining would be close to what you see in many papers that display a single-stranded extension.)
Whether we choose a fixed frame or a relative frame, I think I would prefer to make the field it's own class, like ExtensionRelativeOffset with fields like ExtensionRelativeOffset.length and ExtensionRelativeOffset.angle to improve readability and we can include details about the coordinate system in the class docstring.
This breaks a bit with convention in the Python library. For instance, an insertion is just a Tuple[int, int]
. We could have made it its own class, but then it would be more verbose to modify. Definitely the Dart code benefits from making these classes, because it does not have native support for Tuples (there's some tuple package but it's awkward and can't use the simple Python syntax for accessing elements).
However, I think it should be fine to make it its own class, especially if the chained methods can set these properties without the user needing to create an instance of ExtensionRelativeOffset
, e.g.:
design.draw_strand(0, 0, extension_angle=60, extension_length=2)\
.move(10)\
.extension_3p(angle=30, length=4)
which would make something like this:
Actually, the more I think about this, the more I think that Python users who want to set this position themselves would have an easier time if it's relative to the rotation frame, given how more figures such as this tend to look in papers. (The defaults I'm imagining would be close to what you see in many papers that display a single-stranded extension.)
Sure sounds good, let's use the rotation frame of the adjacent domain.
However, I think it should be fine to make it its own class, especially if the chained methods can set these properties without the user needing to create an instance of
ExtensionRelativeOffset
...
Certainly we want to keep the ability to for the chained methods to be concise without forcing the user to create additional classes. The only issue is coming up with good names for the arguments. For draw_strand
, the extension_angle
argument is clear, but I'm not sure about what is a good name for the length of the offset as extension_length
or extension_5p_length
is already used to refer to the number of dna bases on the extension. Similarly, for extension_3p
, the angle
argument is clear, but again I am not sure what is a good name for the length of the offset because length
is already used for the same reason above. Actually, in your example code above, I wasn't sure whether length refers to the polar coordinate length or the number of dna bases.
Another thing, given that we are using polar coordinates, should we rename the name of the field as well as the class for the field? Is "offset" still the right word when speaking in terms of polar coordinates? Would something like Extension.endpoint
and ExtensionOrientation
be more fitting? Or perhaps we can make it more explicitly that the field has to do with how to display the extension on the web app. So something like Extension.display_config
and ExtensionDisplayConfig
.
Considering the above, my suggestion is to call the field Extension.display_config
and it is of type ExtensionDisplayConfig
, which has fields ExtensionDisplayConfig.length
and ExtensionDisplayConfig.angle
.
For the chain methods, we can have a new method with_extension_display_config
which takes as arguments length and angle and must be called either after draw_strand
with an extension length specified or after extension_3p
. E.g.
design.draw_strand(0, 0, extension_5p_length=2)
.with_extension_display_config(length=0.68, angle=60)
.move(10)\
.extension_3p(length=4)
.with_extension_display_config(length=1.36, angle=30)
I think this is a good balance of conciseness and clarity. Additionally, the dna base "length" and visual display "length" are separated explicitly.
Another thought, since in polar coordinate, the length
field is equivalent to the length of the extension, we could make the default polar coordinate length 0.34 * extension length (number of bases), since each base pair is 0.34 nanometers long.
What are your thoughts @dave-doty?
Certainly we want to keep the ability to for the chained methods to be concise without forcing the user to create additional classes... what is a good name for the length of the offset as extension_length or extension_5p_length is already used to refer to the number of dna bases on the extension.
Oh yeah, I forgot about that field too.
Now that I've thought about it some more, maybe we should just make it a chained method instead of modifying draw_strand
, e.g.,
design.draw_strand(0, 0)
.with_extension_5p(num_bases=7, length=2.1, angle=60.0)
.move(10)
Obviously the last two fields have default values of 1.0 and 45.0, so you could do
design.draw_strand(0, 0)
.with_extension_5p(num_bases=7)
.move(10)
After this discussion, this last one is now my favorite, particularly since we don't need to modify length
and angle
after seeing what type of domain is the first domain after the call to move
(as we did with relative_offset
). I also like that it will look similar to a 3' extension:
design.draw_strand(0, 0)
.with_extension_5p(num_bases=7, length=2.1, angle=60.0)
.move(10)
.cross(1)
.move(-5)
.with_extension_3p(num_bases=5, length=2.1, angle=30.0)
What do you think?
After this discussion, this last one is now my favorite, particularly since we don't need to modify
length
andangle
after seeing what type of domain is the first domain after the call tomove
(as we did withrelative_offset
). I also like that it will look similar to a 3' extension:design.draw_strand(0, 0) .with_extension_5p(num_bases=7, length=2.1, angle=60.0) .move(10) .cross(1) .move(-5) .with_extension_3p(num_bases=5, length=2.1, angle=30.0)
What do you think?
I think think this is a great idea. I just realized that the only reason why we even bother modifying draw_strand
was because we were using fixed frame rather than relative frame. Now that we are using relative frame, there's no need to modify draw_strand
.
In your code, you have with_extension_5p
and with_extension_3p
, but perhaps the functionality of these two are similar enough that we can just have one with_extension
? I do like have _5p
and _3p
because it's more verbose and explicit, which might simplify the error handling. What are your thoughts on this?
The other thing to decide on is how to store the length
and angle
in Extension
. We were originally thinking of Extension.relative_offset
, but what are your thoughts on Extension.endpoint
or Extension.display_config
, which I proposed in my previous comment. Another possibility is storing the fields separately as Extension.length
and Extension.angle
. Out of these options, which one do you think is most consistent with the rest of the scadnano data model? Personally, I like Extension.display_config
with a type of ExtensionDisplayConfig
because it isolates the display configuration into a single field and makes it explicitly that the "length" refers to a display configuration length rather than number of dna bases.
Another suggestion I have in my previous comment is having the default length be 0.34 * num_bases. What are your thoughts on this?
In your code, you have
with_extension_5p
andwith_extension_3p
, but perhaps the functionality of these two are similar enough that we can just have onewith_extension
? I do like have_5p
and_3p
because it's more verbose and explicit, which might simplify the error handling. What are your thoughts on this?
I agree, I like that it's explicit, and it's only 3 extra characters so not cumbersome to type.
I wonder whether we should make two classes Extension5p
and Extension3p
? What I'm imagining is that you have an generic Extension
object and you want to know whether it's on the 5' or 3' end, you have to go digging through the Strand list of domains to figure it out, so it might be nice if the object itself indicated which type it is.
This could also just be implemented as a Boolean field in the single Extension
class. This breaks convention with the three types of Modifications, but there I think it makes more sense to have separate types, since ModificationInternal
actually has different fields than the other two. In the Extension
case, both types of Extension
have identical fields, so making two separate types might be overkill.
Thoughts?
Another possibility is storing the fields separately as Extension.length and Extension.angle. Out of these options, which one do you think is most consistent with the rest of the scadnano data model? Personally, I like Extension.display_config with a type of ExtensionDisplayConfig because it isolates the display configuration into a single field and makes it explicitly that the "length" refers to a display configuration length rather than number of dna bases.
This is a good point, it could get confusing since length
means something else in Loopout
, so I could imagine someone chasing down a bug where they think length
means "number of bases".
Still, that's an awful lot of indirection and hierarchy to introduce. I think it's better just to have a flat structure in Extension
(with three fields). So perhaps this is better achieved by simply changing the name from length
to display_length
? That's consistent with, e.g., other display properties such as the connector length on Modification
.
(Whew, you're seeing why I've delayed so long on implementing this feature. It seems like there's lots of room to do it poorly.)
Another suggestion I have in my previous comment is having the default length be 0.34 * num_bases. What are your thoughts on this?
I can see an argument for this. The main reason I didn't want to do this is that with a fixed length, we can ensure that there's no awkward overlap in the display. With a long enough extension, the extension would display over the top of the adjacent helix. Of course the user could fix this, but it would be nice if the default behavior avoided this.
The other advantage is that it's nice when default values don't depend on surrounding context (another reason I like that we've switched to the convention of (angle, display_length)
relative to the adjacent domain, where the default values are absolute and don't require knowing about the adjacent domain). For example, it's awkward to load helices from the JSON format, because many of their default properties (e.g., idx
) require all of them to be loaded first before default properties can be set. It's nicer when you can re-hydrate an object from JSON based only on the object's JSON and not on surrounding context.
Of course, if the user has different length extensions and want them to be proportional to length, then they'll have to customize. But many designs have the property that all extensions are the same length, so I see this as being less of a problem.
So my intuition is that the first two advantages outweigh the last.
I wonder whether we should make two classes
Extension5p
andExtension3p
? What I'm imagining is that you have an genericExtension
object and you want to know whether it's on the 5' or 3' end, you have to go digging through the Strand list of domains to figure it out, so it might be nice if the object itself indicated which type it is.This could also just be implemented as a Boolean field in the single
Extension
class. This breaks convention with the three types of Modifications, but there I think it makes more sense to have separate types, sinceModificationInternal
actually has different fields than the other two. In theExtension
case, both types ofExtension
have identical fields, so making two separate types might be overkill.Thoughts?
I think the difference between Extension5p
and Extension3p
would be too small to warrant making a separate class. Having a Boolean field in the Extension
class increases the number of invalid states (i.e. having an Extension at the end of the strand.domains
with Boolean field set to 5' end) and I'm not sure how beneficial the field would be. I don't envision distinguishing between 3' and 5' extension to be a common scenario so I think it's best to just have a single Extension
class with no boolean field to distinguish 3' vs 5'.
I think it's better just to have a flat structure in
Extension
(with three fields). So perhaps this is better achieved by simply changing the name fromlength
todisplay_length
?
I'm on board with this idea. Perhaps we can change angle
to display_angle
as well?
Another suggestion I have in my previous comment is having the default length be 0.34 * num_bases. What are your thoughts on this?
I can see an argument for this. The main reason I didn't want to do this is that with a fixed length, we can ensure that there's no awkward overlap in the display. With a long enough extension, the extension would display over the top of the adjacent helix. Of course the user could fix this, but it would be nice if the default behavior avoided this.
The other advantage is that it's nice when default values don't depend on surrounding context (another reason I like that we've switched to the convention of
(angle, display_length)
relative to the adjacent domain, where the default values are absolute and don't require knowing about the adjacent domain). For example, it's awkward to load helices from the JSON format, because many of their default properties (e.g.,idx
) require all of them to be loaded first before default properties can be set. It's nicer when you can re-hydrate an object from JSON based only on the object's JSON and not on surrounding context.Of course, if the user has different length extensions and want them to be proportional to length, then they'll have to customize. But many designs have the property that all extensions are the same length, so I see this as being less of a problem.
So my intuition is that the first two advantages outweigh the last.
Sounds good, let's have a fixed default value of 1.0 for display_length
.
Perhaps we can change
angle
todisplay_angle
as well?
Sure, why not.
I think the difference between
Extension5p
andExtension3p
would be too small to warrant making a separate class. Having a Boolean field in theExtension
class increases the number of invalid states (i.e. having an Extension at the end of thestrand.domains
with Boolean field set to 5' end) and I'm not sure how beneficial the field would be. I don't envision distinguishing between 3' and 5' extension to be a common scenario so I think it's best to just have a singleExtension
class with no boolean field to distinguish 3' vs 5'.
@dave-doty Just to confirm, having one Extension
class with no boolean field to distinguish between 5' vs 3' sounds good to you? I could start implementing this way and then add boolean field later if it turns out 5' vs 3' shows up a lot.
Sounds good.
On Tue, May 10, 2022, 5:58 PM Benjamin Lee @.***> wrote:
I think the difference between Extension5p and Extension3p would be too small to warrant making a separate class. Having a Boolean field in the Extension class increases the number of invalid states (i.e. having an Extension at the end of the strand.domains with Boolean field set to 5' end) and I'm not sure how beneficial the field would be. I don't envision distinguishing between 3' and 5' extension to be a common scenario so I think it's best to just have a single Extension class with no boolean field to distinguish 3' vs 5'.
@dave-doty https://github.com/dave-doty Just to confirm, having one Extension class with no boolean field to distinguish between 5' vs 3' sounds good to you? I could start implementing this way and then add boolean field later if it turns out 5' vs 3' shows up a lot.
— Reply to this email directly, view it on GitHub https://github.com/UC-Davis-molecular-computing/scadnano-python-package/pull/230#issuecomment-1122646865, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETBU7PEOFYKLYMAUESRTF3VJKISJANCNFSM5TPPLAWA . You are receiving this because you were mentioned.Message ID: <UC-Davis-molecular-computing/scadnano-python-package/pull/230/c1122646865 @github.com>
@dave-doty Almost finished implementation in the Python library. Just wanted to double check a few things with you. I have 3 test cases for actions that I think should be disallowed and wanted to see if you agree.
Essentially, should ligate, crossover, and half-crossover be disallowed if applied to the domain on the side where the extension is located.
I also have test cases for normal cases for ligate, crossover, and nick. All the error cases and normal cases begin here, if you want to take a look:
Yes, I agree those cases should all be errors.
Discovered something strange about ligate
while implementing the ligate error checking. There's no check on Python that the domains to ligate are at the end of the strand. For example something like this:
Clicking on helix 0, offset 15/16 should not be allowed. This check is done in the web app (nothing happens when I click there on ligate mode). However, this check is not implemented in the Python library:
Test case:
def test_ligate_middle(self) -> None:
"""
[-----+[----->
|
<-----+
"""
design: sc.Design = sc.Design(helices=[sc.Helix(max_offset=100), sc.Helix(max_offset=100)])
design.draw_strand(0, 0).to(10).cross(1).to(0)
design.draw_strand(0, 10).to(20)
design.ligate(0, 10, True)
Debugging design, I see the resulting strand object contains two overlapping domains, which is nonsensical:
"strands": [
{
"color": "#f74308",
"domains": [
{"helix": 0, "forward": true, "start": 0, "end": 10},
{"helix": 0, "forward": true, "start": 0, "end": 20}
]
}
]
Is there some work that needs to be done with the existing ligate function to prevent this?
I notice a similar problem with crossovers:
In the example here, adding a crossover between 1 and 2 at offset 0 should not be allowed. The web app prevents this (because of the modify crossover feature) but there are no checks in the python code:
def test_add_half_crossover_middle(self) -> None:
"""
0 +------]
|
1 +------>
2 <------]
"""
# Setup
design: sc.Design = sc.Design(
helices=[sc.Helix(max_offset=100), sc.Helix(max_offset=100), sc.Helix(max_offset=100)]
)
design.draw_strand(0, 10).to(0).cross(1).to(10)
design.draw_strand(2, 10).to(0)
design.add_half_crossover(1, 2, 0, True)
The code should throw an error but instead, we get a new design with strands:
"strands": [
{
"color": "#57bb00",
"domains": [
{"helix": 2, "forward": false, "start": 0, "end": 10},
{"helix": 0, "forward": false, "start": 0, "end": 10},
{"helix": 1, "forward": true, "start": 0, "end": 10}
]
}
]
which visually looks like:
which isn't right either. Am I using these functions incorrectly or are there actual bugs?
Is there some work that needs to be done with the existing ligate function to prevent this?
Definitely looks like an error to me. Do you mind adding unit tests to ensure it raises an exception?
Am I using these functions incorrectly or are there actual bugs?
This also looks like a bug.
@dave-doty Changes are ready to merge to dev now
Can you write up some simple release notes and put them as a comment in the related issue #2? That way we can copy/paste them into the release notes when this is merged to main.
WIP: #2
@dave-doty I added some unit tests for chain methods for DNA extensions. Can you give feedback on the unit tests written so far. Is the tested behavior desirable? Are there additional tests/behavior you would like me to add?