WormBase / wormbase-pipeline

Wormbase Build Pipeline
http://www.wormbase.org
22 stars 13 forks source link

Pseudogene attribute addition:Truncated & Frameshifts #28

Closed Paul-Davis closed 8 years ago

Paul-Davis commented 8 years ago

Proposed by: Michael Background: Brugia contains horizontally transferred genes. As the author spent quite a bit of work to add additional information for pseudogenes, I would like to add them as tags to the Pseudogene class. Namely:

?Pseudogene Attributes Transcribed #Evidence
                       Fragment #Evidence
                       SubType UNIQUE processed_pseudogene #Evidence
                                      unprocessed_pseudogene #Evidence
                       Truncated five_prime #Evidence
                                 three_prime #Evidence
                       Frameshifts UNIQUE Int
tuli commented 8 years ago

I support the proposal, but only comment is that for consistency five_prime could be 5_prime or something.

We have UTR_3 and UTR_5 in #Molecular_change. Obviously it's clear that 3 means 3 prime (or 3'), and it would work to have Truncated 5 #Evidence

Could we have Truncated_5 Truncated_3

??

epaule commented 8 years ago

I would like to avoid having integers as tags. If it has to be numbers, i would rather do:

Truncated 5_prime_truncated #Evidence
          3_prime_truncated #Evidence
Frameshifts UNIQUE Int
khowe commented 8 years ago

The "Truncated" parent disappears in the Datomic conversion, and the only purpose it serves (as far as I can tell) is Acedb models readability. Can we not simply do:

five_prime_truncated three_prime_truncated

?

Paul-Davis commented 8 years ago

Yes this would simplify the model.wrm_anno conversion for Datomic imports, so just have additional top level attributes is nice. @epaule is this ok with you.

Paul-Davis commented 8 years ago
?Pseudogene Attributes Transcribed #Evidence
                       Fragment #Evidence
                       SubType UNIQUE processed_pseudogene #Evidence
                                      unprocessed_pseudogene #Evidence
                       5_prime_truncated #Evidence
                       3_prime_truncated #Evidence
                       Frameshifts UNIQUE Int
khowe commented 8 years ago

@tuli 's original suggestion is more consistent with other 5/3-containing tags in the model:

Truncated_5 Truncated_3

Paul-Davis commented 8 years ago

Final itteration: Ok @epaule

?Pseudogene Attributes Transcribed #Evidence
                       Fragment #Evidence
                       SubType UNIQUE processed_pseudogene #Evidence
                                      unprocessed_pseudogene #Evidence
                       Truncated_5 #Evidence
                       Truncated_3 #Evidence
                       Frameshifts UNIQUE Int
epaule commented 8 years ago

looks fine.