SynBioDex / SBOL-specification

The Synthetic Biology Open Language (SBOL)
http://sbolstandard.org
13 stars 9 forks source link

sbol3-10807 - Clarify which Component's sequence should be linked to locations #492

Open goksel opened 1 year ago

goksel commented 1 year ago

According to the validation rule 10807, the location object's sequence properties should point subComponent.instanceOf.Sequence rather than via the parent component that contains SubComponent. Can we please double check if this rule is correct?

sbol3-10807 - If a SubComponent object has at least one hasLocation and zero sourceLocation properties, and the Component linked by its instanceOf has precisely one hasSequence property whose Sequence has a value for its elements property, then the sum of the lengths of the Location objects referred to by the hasLocation properties MUST equal the length of the elements value of the Sequence.

Considering the following example. I assume Range1.hasSequence points to parent_Sequence1 rather than child_Sequence1.

:parent a sbol:Component ; sbol:hasFeature <parent/SubComponent1> ; sbol:hasSequence :parent_Sequence1 .

<parent/SubComponent1> a sbol:SubComponent ; sbol:hasLocation <parent/SubComponent1/Range1> ; sbol:instanceOf :child .

:parent_Sequence1 a sbol:Sequence ; sbol:elements "atgcgtaaaggagaagaacttttca" .

<parent/SubComponent1/Range1> a sbol:Range ; sbol:hasSequence :parent_Sequence1 .

:child a sbol:Component ; sbol:hasSequence :child_Sequence1 .

:child_Sequence1 a sbol:Sequence ; sbol:elements "atg" .

jakebeal commented 1 year ago

I'm having a hard time with your example: your range objects are missing their required start and end locations, so I can't tell if the example complies with the rule or not.

If Range1 had start=1 and end=3, however, that would comply with the rule, since then the length of the Range would be 3, and that is the same as the length of the elements of child_Sequence1.

Here's the key idea in this rule: in the prior rule, sbol3-10806, we can directly compare sequence lengths because we have both a sourceLocation (pointing to a Sequence associated with the child) and a hasLocation (pointing to a Sequence associated with the parent).

But sourceLocation is optional: what do we do if we don't have a sourceLocation? In this case, we fall based to using the child's whole sequence. Rule sbol3-10807 is checking that the lengths are compatible for that case.

cjmyers commented 1 year ago

I'm also confused by your example. I thought hasSequence is used to identify which sequence is where the subComponent is found in the case that the Component referenced by the instanceOf property has more than one sequence linked to it. In this case, pointing to the parent_Sequence1 should not be allowed, since it is not one of the sequences for child.

goksel commented 1 year ago

I changed the example as shown below. Could you please check if it is correct now (including the range start and end locations)? Range.hasSequence now points to sequences from the child components.

I think the example below will now violate sbol3-11402 ("The value of the end property of a Range MUST be greater than zero and less than or equal to the length of the elements value of theSequence referred to by its hasSequence property").

Additionally, according to sbol3-11302, Range.hasSequence should point to sequences from the parent. As a result, this example would also violate sbol3-11302 ("For every Location that is not an EntireSequence and that is the value of a hasLocation property of a Feature, the value of its hasSequence property MUST also either be a value of the hasSequence property of the parent Component or else be the value of some hasSequence property of an EntireSequence that is also a child of the same Component").

I think I misinterpret some of these validation rules, but I'm not sure which ones. Are you aware of any use cases or example diagrams I can look at? I probably need to revise all the validation rules that depend on the hasLocation and sourceLocation properties.

:parent a sbol:Component ; sbol:hasFeature <parent/SubComponent1> ; sbol:hasSequence :parent_Sequence1 ;

:parent_Sequence1 a sbol:Sequence ; sbol:elements "atgcgtaaaggagaagaacttttca" ;

:child a sbol:Component ; sbol:hasSequence :child_Sequence1 ;

:child_Sequence1 a sbol:Sequence ; sbol:elements "atgaaa" ;

<parent/SubComponent1> a sbol:SubComponent ; sbol:hasLocation <parent/SubComponent1/Range2> , <parent/SubComponent1/Range1> ; sbol:instanceOf :child ;

<parent/SubComponent1/Range1> a sbol:Range ; sbol:hasSequence :child_Sequence1 ; sbol:start "1" ; sbol:end "3" .

<parent/SubComponent1/Range2> a sbol:Range ; sbol:hasSequence :child_Sequence1 ; sbol:start "7" ; sbol:end "9" .

jakebeal commented 1 year ago

You are correct: your example violates both of those rules, as expected:

goksel commented 1 year ago

Chris, Jake, apologies, I'm still not understanding the rule 10807 (If a SubComponent object has at least one hasLocation and zero sourceLocation properties, and the Component linked by its instanceOf has precisely one hasSequence property whose Sequence has a value for its elements property, then the sum of the lengths of the Location objects referred to by the hasLocation properties MUST equal the length of the elements value of the Sequence.).

I'm trying to come up with a case that satisfies this rule. But, I'm still struggling to understand this rule biologically. Would you be able to please send me a biologically valid SBOL example for this validation rule?

According to the rule 10807, I assume the following (I must misunderstand something here):

jakebeal commented 1 year ago

The typical case of Rule 10807 is including a whole part.

For an example, take a look at the Anderson Promoters package in the iGEM distribution. Specifically, let's look at the plasmid insert for J23100, Anderson_Promoters_in_vector_ins_BBa_J23100.

Thus, this construction says "the sequence for BBa_J23100 goes in location 181-215 on Anderson_Promoters_in_vector_ins_BBa_J23100" This is a valid construction because it takes a 35 bp sequence and puts it in a 35 bp location.

Here are some variations that are important for understanding the operation of the rule:

gmisirli commented 1 year ago

Many thanks @jakebeal

I provided an example for multiple locations, this is where I was confused. I hope this is correct. I extended your promoter example to specify the RNAP binding feature using two ranges (e.g. -35 and -10 boxes). I only showed the necessary properties in the example. Each range has 5 bases, and the binding sequence is 10 bases in total.

<plasmid/SubComponent1/Range1> a sbol:Range ; sbol:end "185" ; sbol:hasSequence :plasmid_Sequence1 ; sbol:start "181" .

:BBa_J23100_RNAPbinding a sbol:Component ; sbol:hasSequence :BBa_J23100_RNAPbinding_Sequence1 ;

<plasmid/SubComponent1> a sbol:SubComponent ; sbol:hasLocation <plasmid/SubComponent1/Range2> , <plasmid/SubComponent1/Range1> ; sbol:instanceOf :BBa_J23100_RNAPbinding .

<plasmid/SubComponent1/Range2> a sbol:Range ; sbol:end "214" ; sbol:hasSequence :plasmid_Sequence1 ; sbol:start "210" .

:BBa_J23100_RNAPbinding_Sequence1 a sbol:Sequence ; sbol:elements "ttgacctagc" ;

:plasmid a sbol:Component ; sbol:hasFeature <plasmid/SubComponent1> ; sbol:hasSequence :plasmid_Sequence1 ;

:plasmid_Sequence1 a sbol:Sequence ; sbol:elements "aacgatgatgctcactc......";

jakebeal commented 1 year ago

Yes, what you have written here is valid by rule sbol3-10807.