bio4j / dynamograph

GSoC 2014 project - a DynamoDB based graph DB
GNU Affero General Public License v3.0
4 stars 1 forks source link

Design and implement scala model for ncbiTaxonomy #20

Open alberskib opened 10 years ago

alberskib commented 10 years ago

Implement scala model for ncbiTaxonomy (similar like model for GO)

alberskib commented 10 years ago

That's all relationships and properties that we would like to store for ncbiTaxonomy?

  val schema = GraphSchema("ncbiTaxonomy", 
    properties = 
      id :~:
      name :~:
      comment :~:
      scientificName :~:
      ∅, 
    vertexTypes = 
      NcbiTaxon :~: 
      Rank :~:
      ∅,
    edgeTypes = 
      Parent :~: 
      AssignedRank :~:
      Subrank :~:
      ∅
  )

from https://github.com/bio4j/scala-model/blob/master/src/main/scala/bio4j/model/module/ncbiTaxonomy.scala

alberskib commented 10 years ago

@bio4j/dynamograph Bump

laughedelic commented 10 years ago

I think, that's all. @pablopareja maybe you know it better?

pablopareja commented 10 years ago

So far we were not modelling Rank as a vertex but rather as a simple property for NCBITaxon. I actually don't know if it's worth it to include it and/or whether it's reliable...? :confused: Maybe @rtobes and @marina-manrique can help us out with this?

rtobes commented 10 years ago

Rank is very important because there are many analysis based on rank. Biologically is more expressive than the level in the taxonomy tree. One important thing is to add to each rank a number indicating the order of taxonomical specificity. It allows ordering by specificity many things as for example the assignments in metagenomics analysis. I don't know how to model the order number but perhaps as a property?

eparejatobes commented 10 years ago

@rtobes yes a property for the level is a nice addition. And there should be a vertex type for rank (there is onesuch in what @alberskib wrote).

rtobes commented 10 years ago

To clarify. This is the rank order number that I am talking about: Order number Rank 0 no rank 1 superkingdom 2 kingdom 3 superphylum 4 phylum 5 subphylum 6 class 7 subclass 8 order 9 suborder 10 family 11 subfamily 12 tribe 13 subtribe 14 genus 15 subgenus 16 species group 17 species subgroup 18 species 19 subspecies

After reading the @alberskib comment I see that the edgeType subrank will allow to infer the order number but to have this number as a property perhaps would do easier to organize some query results.