Open srynobio opened 9 years ago
Commented by eilbeck on 2008-03-12 22:38 UTC Logged In: YES user_id=742851 Originator: YES
Here is an update to suggest possible topological relationships to be included in SO. Topological relationships are a kind of spatial relation that are preserved under transformations such as scaling and translation and rotation. These relations are taken from the egenhofer paper mentioned below.
What I want to do here is for each case draw an imaginary chunk of sequence in aasci and then define the relation. I am going to make the definitions sequence based to start with but am open to making them more generic if needed.
Features A and B are disjoint if no intersection exists between their boundaries and interiors.
If A is disjoint from B then I think B is also disjoint from A.
If A is Adjacent to B, B may not be adjacent to A.
equal. |---A----| |---B----| Features A and B are equal if the intersections of the boundary and interior are not empty.
inside |-A--| |--B-------| Feature A is inside feature B if if A and B share interior but not boundary and if A has boundary which is interior to B and none of B's boundary coincides with A's interior.
contains |--A------| |-B--| Feature A contains feature B if A and B share interior sequence but none of A's boundary coincides with B's interior.
covers |--A--| |--B-----| Feature A covers feature B if both share a common boundary and interior sequence and B has interior which coincides with the boundary of A and none of A's interior sequence is part of B's boundary.
covered_by |--A------| |--B--| Feature A is covered_by feature B if A and B share interior sequence and a common boundary and A has interior sequence which coincides with B’s boundary but none of B’s interior sequences is part of B’s boundary.
8.overlap |--A--| |--B--|
Feature A overlaps feature B if they have common interior sequences and each has a boundary that is common to the opposites interior.
I can think of cases for all of these relations between features in SO. We need to work out if we need them all and what the qualities of each relation are. Are they transitive/ circular etc. Do we want the definitions to be sequence specific or broad enough to cover n dimensional space? Can we tighten the definitions more?
I look forward to some discussion.
--Karen
Commented by batchelorc on 2008-03-20 16:59 UTC Logged In: YES user_id=1473024 Originator: NO
Hello,
I hadn't noticed your followup post, sorry. Monitoring this now.
I think the equal relationship (equal_to, I suppose) is both symmetric and antisymmetric!
(This may mean that I have misunderstood antisymmetry.)
(Hope my ASCII art survives the posting process.)
transitive symmetric reflexive antisymmetric disjoint - + - - adjacent_to - + + - equal + + + + inside + - - - contains + - - - covers + - - - covered_by + - - - overlaps - + ? -
Other observations. In 2, you say "If A is Adjacent to B, B may not be adjacent to A." Surely the boundary of A and B is the same as the boundary of B and A, hence symmetry follows?
The other thing I want to think about is non-contiguous sequences, so for disjoint_from we have these possibilities among others:
|---A----| |---B---| |----A----| |----B----| |---B----| |---B---| |----A---| |---A--|
adjacent_to
|---A---||---B---| |---A---| |---B---|
equal
|---A---| |---A---| |---B---| |---B---|
inside (and contains, mutatis mutandis)
|---A---| |---A---| |----B---------------------|
Now. covers vs. overlaps: do I take it that covers is not a special case of overlaps? This definition might need to be tightened.
Best wishes, Colin.
Commented by eilbeck on 2008-03-20 19:32 UTC Logged In: YES user_id=742851 Originator: YES
Hi Colin,
Glad you are thinking about this. So with regards to adjacent_to, BS convinced me that it was not symmetrical. A adjacent_to B. All instances of A are adjacent to some instance of B. The reverse does not hold for B. We can think about it using the example of the poly A tail and transcript. A polyA tail will always be attached to a transcript but a transcript does not have to have a polyA tail.
I just looked up antisymmetric on wiki. Yep that looks like it. I can think of one case where I would use equal_to. That would be when a transcript has one exon. so the exon would be equal to the transcript.
I'm not sure I get what you mean about non-contiguous and disjont_from
Covers and covered by are different from contains and inside because they also share a boundary. Covers is not a special case of overlaps.
--K
Commented by batchelorc on 2008-05-20 13:30 UTC Logged In: YES user_id=1473024 Originator: NO
Hello,
Sorry for letting this slip.
Symmetry: of course. Yes. So what relations are symmetric?
I'm not sure whether disjoint is: If all A are disjoint from some B, then all B are disjoint from some A. But overlaps isn't, for the same reasons as adjacent_to? Likewise equals, if all transcripts A are equal to some exon B, then all exons B are not necessarily equal to some transcript A?
Happy with transitivity.
Reflexivity: convinced that overlaps is not reflexive, because A shares a boundary with itself.
However, because A shares a boundary with itself, then adjacent_to is reflexive.
Here's the modified table:
transitive symmetric reflexive antisymmetric disjoint - + - - adjacent_to - - + - equal + - + + inside + - - - contains + - - - covers + - - - covered_by + - - - overlaps - - - -
OK. Now to think about those attributes you list.
Best wishes, Colin.
Commented by batchelorc on 2008-05-20 14:22 UTC Logged In: YES user_id=1473024 Originator: NO
SO:0000068 overlapping
Not sure how to write this attribute in an xp fashion. One could always change it to:
name: overlapping_region intersection_of: SO:0000001 ! region intersection_of: overlaps SO:0000001 ! region
but I don't know what this buys us. In terms of the alternative splicing attributes, though:
name: overlapping_peptide_region intersection_of: polypeptide_region intersection_of: overlaps polypeptide_region
then:
name: gene_encoding_overlapping_peptides intersection_of: gene intersection_of: "encodes" overlapping_peptide_region
I put "encodes" in quote marks because it's not an all--some relationship as written there.
This is identical to encodes_overlapping_polypeptides_different_start_and_stop because of the Egenhoferian def of overlaps.
The different_start and different_stop attributes are either covers or covered_by. Are they both special cases of adjacent_to? (As far as I can see, adjacent_to merely says that the regions share a boundary so might cover this case: |---A---| |-B-| as well as the canonical |---A---|-B-|.)
encodes_1_polypeptide could then be replaced by a feature:
name: gene_encoding_one_polypeptide intersection_of: gene intersection_of: equals polypeptide_region
Do we need cardinality for encodes_greater_than_1_polypeptide?
> > SO:0000069 inside_intron > > SO:0000070 inside_intron_antiparallel > > SO:0000071 inside_intron_parallel > > SO:0000073 five_prime_three_prime_overlap > > SO:0000074 five_prime_five_prime_overlap > > SO:0000075 three_prime_three_prime_overlap > > * SO:0000076 three_prime_five_prime_overlap
OK. Happy with inside_intron now---we'd have
name: region_inside_intron intersection_of: region intersection_of: inside intron
and the famous twintron:
name: twintron def: [whatever twintron's def is] intersection_of: intron intersection_of: inside intron
Not sure how to implement the parallel and antiparallel cases.
For the five_prime_five_prime_overlap etc. cases---are these simply saying that there is an overlap at the 5' end, or is it something more specific like the 5' UTR?
I think I need to start writing this all up properly.
Best wishes, Colin.
Commented by batchelorc on 2008-05-21 06:14 UTC Logged In: YES user_id=1473024 Originator: NO
Hello,
Just a note in the cold light of morning to say that I can't tell what overlapping_peptide_region would be pairwise disjoint with, other than regions which have other Egenhoferian relationships to other regions.
Do we know how these attributes are being used in the MODs?
Colin.
Reported by eilbeck on 2008-02-14 00:21 UTC There are attributes that describe genes based on location with regards to other genes.
> > SO:0000068 overlapping > > SO:0000069 inside_intron > > SO:0000070 inside_intron_antiparallel > > SO:0000071 inside_intron_parallel > > SO:0000073 five_prime_three_prime_overlap > > SO:0000074 five_prime_five_prime_overlap > > SO:0000075 three_prime_three_prime_overlap > > SO:0000076 three_prime_five_prime_overlap
I think we may require some topological relationships if we are to use these terms to make cross products with features.
For example overlapping and inside. We already have adjacent to. A good place to start thinking about these rels is: Max Egenhofer, "A Formal Definition of Binary Topological Relationships", Third International Conference on Foundations of Data Organization and Algorithms (FODO), Paris, France, W. Litwin and H. Schek (eds.), Lecture Notes in Computer Science, Vol. 367, Springer-Verlag, pp. 457-472
I think that these terms that are attributes of alternately spliced genes could also do with some help from topology. encodes_1_polypeptide encodes_greater_than_1_polypeptide encodes_overlapping_peptides encodes_different_polypeptides_different_stop encodes_overlapping_peptides_different_start encodes_overlapping_polypeptides_different_start_and_stop
This is mostly a reminder to Colin and I that we need to work on this problem but if anyone else has any comments/suggestions - always welcome.
--Karen