VirtualFlyBrain / VFB_neo4j

A python package for writing schema-compliant content to VFB neo4J DBs
Apache License 2.0
0 stars 1 forks source link

Switch schema for data_source sites to generic dbxrefs system #27

Open dosumis opened 7 years ago

dosumis commented 7 years ago

Currently:

(Individual)-[:has_source { ID_in_source: 'fubar' })-(:data_source { data_link_pre: ''...})

Switch to

dosumis commented 7 years ago

Some concerns:

New schema is slightly denormalized for data_source linkouts. It requires two edges for every individual linked rather than one. I'll stick to this plan for now, but for maintenance purposes, it would be better if this representation was prod only.

Make sites for each resource

MATCH (ds:data_source { name: 'Chiang2010' })
MERGE (s:Site { label : "FlyCircuit" }) 
SET s.link_base = ds.data_link_pre
SET s.iri = "http://flycircuit.tw"
REMOVE ds.data_link_pre
MERGE (s)<-[:has_site]-(ds)
MATCH (ds:data_source { name: 'Jenett2012' })
MERGE (s:Site { label : "FlyLight" }) 
SET s.link_base = ds.data_link_pre
SET s.iri = "http://flweb.janelia.org/cgi-bin/flew.cgi"
SET s.logo_url = "http://flweb.janelia.org/images/fly_light_color.png"
REMOVE ds.data_link_pre
MERGE (s)<-[:has_site]-(ds)
MATCH (ds:data_source { name: 'Knowles-Barley2010'})
MERGE (s:Site { label : "BrainTrap"})
SET s.link_base = ds.data_link_pre
REMOVE ds.data_link_pre
SET s.iri = "http://braintrap.inf.ed.ac.uk/braintrap/"
SET s.logo_url  = "http://braintrap.inf.ed.ac.uk/braintrap/images/BrainTrap-bg1.gif" 
MERGE (s)<-[:has_site]-(ds)
CREATE (s:site { label : "Jefferis lab - NBLAST Neuron clusters: ",
iri: "http://flybrain.mrc-lmb.cam.ac.uk/vfb/fc/clusterv/"  // doesn't resolve
link_base: "http://flybrain.mrc-lmb.cam.ac.uk/vfb/fc/clusterv/" })  
MERGE (s)<-[:has_site]-(ds:data_source { name: 'CostaJefferis2010' })

Accession for these will be {cluster version number}/exemplar}

Create edge from all FlyCircuit neurons for these ?

CREATE (:site { label : "NBLAST on-the-fly: Neurons" , 
iri: "http://flybrain.mrc-lmb.cam.ac.uk:8080/NBLAST_on-the-fly"
link_base :  'http://flybrain.mrc-lmb.cam.ac.uk:8080/NBLAST_on-the-fly/?all_query="' 
link_postfix : '&all_use_mean=TRUE'})

CREATE (:site { label : "Jefferis lab - NBLAST on-the-fly: GAL4", 
iri: "http://flybrain.mrc-lmb.cam.ac.uk:8080/NBLAST_on-the-fly/?all_query=&all_use_mean=F&gal4_n=10&gal4_query=&tab=One%20against%20all"
link_base = 'http://flybrain.mrc-lmb.cam.ac.uk:8080/NBLAST_on-the-fly/?gal4_query=', 
link_postfix : '"&tab=GAL4"' })

Move accessions to ds-site edges

MATCH (s:Site)-[:has_site]-(ds:data_source)<-[hs:has_source]-(i:Individual)
MERGE (i)-[dbx:hasDbXref]-(s) SET dbx.accession = hs.id_in_source
REMOVE hs.id_in_source 

Generic mechanism for expanding xrefs on classes in ontology to linkouts:

xref strings are {DB}:{accession} A lookup for dbs can be found ind

site.xref_db = {DB} hasDbXref.accession = {accession}

Cypher for generating these in prod.

OLS neo pre-parses:

annotation-database_cross_reference: [VFB:FBbt_00007263] obo_xref:[{"database":"VFB","id":"FBbt_00007263","description":null,"url":null}]

obo_xref is a list of JSON strings - not (unfortunately) a cypher map. So processing either will require some scripting.

MATCH (s:Site) ASSERT s.xref_db IS UNIQUE

MATCH (c:Class) WHERE exists(c.obo_xref) RETURN c.short_form as class_sfid,  c.obo_xref as class_xrefs

MATCH (s:Site) WHERE exists(s.xref_db) RETURN map( s.xref_db -> s.iri)

Iterate over class - xref -> Iterate over xrefs -> convert each to json, check if j['database] in c.obo_xref

-> roll cypher

Other sites for linkouts:

MERGE (s:Site { label: 'DoOR' })
SET s.description =  "Database of Odorant responses." 
SET s.iri  =  "http://neuro.uni-konstanz.de/DoOR"
SET s.xref_db = "DoOR"
SET s.link_base = "http://neuro.uni-konstanz.de/DoOR/content/receptor.php?OR="
SET s.link_icon_url = 'http://neuro.uni-konstanz.de/DoOR/navi_pic.jpg' 
MERGE (s:Site { label: "FlyBase"})
SET s.iri = "http://flybase.org"
SET s.link_base = "http://flybase.org/reports"
SET s.link_icon_url = "http://flybase.org/static_pages/images/global/fly_logo.png"
SET s.description = "A Database of Drosophila Genes & Genomes."

Need consistency with spec here: https://github.com/VirtualFlyBrain/VFB_neo4j/blob/master/src/uk/ac/ebi/vfb/neo4j/flybase2neo/import_all_pub_data.py#L47

mmc46 commented 7 years ago

FlyBrain NDB Links from Classes Drop this?? given it will be shut down soon?: it would be better to be further down the integration path before we drop the links. DoOR Links from classes - Need to be added to Ontology.: already added as xref.

dosumis commented 7 years ago

FlyBrain NDB Links from Classes Drop this?? given it will be shut down soon?: it would be better to be further down the integration path before we drop the links.

OK. Will add to NEO.

DoOR Links from classes - Need to be added to Ontology.: already added as xref.

Sorry. Had forgotten. Working on a generic mechanism to roll these in Neo4J prod.