WormBase / pseudoace

Modelling the WormBase ACeDB database in datomic.
4 stars 3 forks source link

Possibly missing association between pcr-product and expr-profile schema #67

Closed sibyl229 closed 7 years ago

sibyl229 commented 7 years ago

Hi @mgrbyte,

I think there might be a schema translation issue going on here. According to ACeDB schema, there is an association between ?Expr_profile and ?PCR_product. Here is the relevant part in the model:

screen shot 2017-03-09 at 5 57 30 pm

But in Datomic, I couldn't seem to find this association from either schema. I tried to query for any attribute with :pace/obj-ref namespace pcr-product or expr-profile, but none of the results appear related to this particular association.

[:find ?eid ?aid ?na
 :in $ ?na
 :where
 [_ :db.install/attribute ?e]
 [?e :db/ident ?eid]
 [?e :pace/obj-ref ?a]
 [?a :db/ident ?aid]
 [(namespace ?aid) ?na]
]

relates to WormBase/website#5070 on the microarray topography map discussion.

Thank you!

khowe commented 7 years ago

@mgrbyte looks like Expr_profile connection to PCR_product was commented out of the annotated models for WS248. Commit message does not say why though.

mgrbyte commented 7 years ago

@sibyl229 @khowe SMap constructs in ACeDB are not kept during the conversion process. Thomas re-implemented an intended equivalent to SMap in pseudoace, known as the "Locatables" API (See dialogue under the schema on that page for detail of the idea).

Anywhere you see SMap, look for the analogy using this locatables API. The following function shows how to traverse to a gene via CDS' from an expression profile. Note that a PCR product may reference many CDS', thus there may be many associated genes.

(defn gene-from-expr-profile
  [db expr-profile]
  (let [coding-seqs (-> expr-profile
                        ;; pcr-product via parent
                        :locatable/parent 
                        ;; cds holders via reverse relationship
                        :cds/_corresponding-pcr-product)
        corr-coding-seqs (map :gene.corresponding-cds/_cds coding-seqs)
        genes (flatten
               (for [cds corr-coding-seqs]
                 (map :gene/_corresponding-cds cds)))]
    (first genes)))
mgrbyte commented 7 years ago

@sibyl229 to get to the gene via the transcript, you'd need to use substitute the CDS based look-ups with Transcript ones if using the function above, i.e.:

:transcript/_corresponding-pcr-product instead of :cds/_corresponding-pcr-product

:gene.corresponding-transcript/_transcript instead of :gene.corresponding-cds/_cds

:gene/_corresponding-transcript instead of :gene/_corresponding-cds

sibyl229 commented 7 years ago

Thanks a lot @mgrbyte . It works for me now. Closing!

mgrbyte commented 7 years ago

Just a note for posterity: the underscore notation for traversing relationships is only applicable when using the Datomic Entity API. In datalog queries, you can swap the position of variables applied to the canonical attribute to change the order of traversal, e.g:

..
: where
[?e :pcr-product/id ?pcr-product]
[?lp :locatable/parent ?e]]