EBISPOT / scxa_2_cxg

Apache License 2.0
1 stars 0 forks source link

API use cases #3

Open dosumis opened 5 months ago

dosumis commented 5 months ago
  1. As a bioinformaticians trying to find data relevant to planned analysis, I want to retrieve pre-generated, CxG standard h5ad files based on any combination of - publication DOI, sample tissue, sample developmental stage, assay, cell type... using ontology closure to return more results. The results should take the form of a CSV table with metadata (Citation, tissue(s), stage(s) etc) plus link to h5ad file for download

Draft tech spec:

KG - already has a queryable neo4j graph of cell sets linked to ontology terms and dataset nodes & a SOLR endpoint with all nodes indexed.

Query strategy: SOLR instance is sufficiently denormalised that above use case can be fulfilled with 1-3 queries. Denormalizations needed: ontology closures; dataset metadata + file link? Precise schema TBD.

  1. Knowledge Graph use cases: More discussion needed of what extended KG content will look like. Will we include an extended Graph with GO. Will we fold in GO annotations? Analysis of cell set transcriptomes => predicted GO BP an CC?
dosumis commented 1 month ago

Query for datasets containing types of T cell

MATCH (c)-[:SUBCLASSOF*0..]->(d) WHERE d.label = 'T cell'
MATCH p=(ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell) 
RETURN distinct  n.label as author_annotion, c.label as CL_annotation, ds.download_link[0], ds.title[0], ds.publication[0]

Find all datasets that use tissue from the lung

MATCH (n:Cell_cluster)-[r:tissue]->(t)-[:SUBCLASSOF|part_of*0..]->(:Class {label: 'lung'}) 
MATCH (ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell) 
RETURN distinct  n.label as author_annotion, t.label as tissue, ds.download_link[0], ds.title[0], ds.publication[0]

We could do the same for stage, disease and organism. :Class nodes also have curie & synonyms so we can support search on these too.

To do combinatorial queries. we can combine them.

e.g.

MATCH (n:Cell_cluster)-[r:tissue]->(t)-[:SUBCLASSOF|part_of*0..]->(:Class {label: 'lung'}) 
MATCH (c)-[:SUBCLASSOF*0..]->(d) WHERE d.label = 'epithelial cell'
MATCH (ds)-[:has_source]-(n:Cell_cluster)-[:composed_primarily_of]->(c:Class:Cell) 
RETURN distinct  n.label as author_annotion, c.label as CL_annotation, ds.download_link[0], ds.title[0], ds.publication[0]