IBM / datascienceontology

Data Science Ontology
https://www.datascienceontology.org
Creative Commons Attribution 4.0 International
36 stars 14 forks source link

Migrate to graph database #22

Open sander opened 4 years ago

sander commented 4 years ago

@epatters wrote in #16:

BTW, for a while I've been considering migrating to a graph database, possibly Dgraph, to enable more flexible querying, but I haven't yet been able to dedicate the time.

What kind of queries are you thinking about that cannot easily be handled using CouchDB views?

For the purpose of datascienceontology-frontend, I'm thinking the database design could even be "less intelligent" and easier to maintain. Since a built ontology just consists of static linked documents plus search/browse indices, all public, at the current scale an S3/IPFS/Dat bucket containing these static docs plus some static indices might be sufficient.

Asking because I'm interested in creating patches to make the frontend and collaboration workflow easier to use and more engaging. I'd like to explore several use cases for collaborative ontology building across concepts and code using this project. The direction this upstream project is going with database management impacts how I should focus my effort.

epatters commented 4 years ago

You're right that the current features of the DSO web frontend are adequately served by CouchDB. Although I'm not familiar with IPFS or Dat, you're probably right that even simpler infrastructure would suffice for the current feature set.

One reason I was considering migrating to a graph DB is to enable more sophisticated interlinking. For example, it would be nice if the page for a concept would list all the annotations that use it, so that users could quickly identify the Python or R packages implementing that concept. To do that, we would need to deep search though the function annotations' expression trees, which AFAIK is not straightforward in the document DB model. I believe it would be straightforward in a graph DB model where each node in the tree is a node in the graph DB.

Now, since the site's content is basically static, this could be done at build time by generating static indexes. I think that is the point you were making. But at a certain point I imagine it becomes easier to just use a DB for the querying and slap a cache in front of it if performance is an issue.

I am delighted that you are interested in using this project to explore collaborative ontology editing, and I would like to support this in any way that I can. It has always been my intention that the DSO be a wiki-style collaborative effort, but so far I have not had much success in getting contributions. I suspect that part of the problem (besides a general lack of visibility) is that understanding what is going on and contributing is not nearly as easy as it could be.

sander commented 4 years ago

To explore easy querying and interlinking, I've started a Clojure-based ontology-linker that creates JSON-LD representations and indices for DSO:

https://github.com/sander/ontology-linker

An advantage of developing this using Clojure is that the language makes data processing and exploration easy. If using graph DB turns out to make queries easier to prepare indices, DataScript might be helpful:

https://github.com/tonsky/datascript

A demo workflow is configured in:

https://github.com/sander/data-science-ontology/tree/feat/json-ld

The currently most recent linked build is published using IPFS and available from content delivery networks such as:

https://cloudflare-ipfs.com/ipns/data-science-ontology.sanderdijkhuis.nl

An advantage of IPFS is that the ontology publisher does not need to maintain their own infrastructure to keep the ontology available, and any editor can preview or publish their own forks without cloning any infrastructure.

A first step in making the frontend navigate and render JSON-LD is in:

https://github.com/sander/ontology-browser/tree/feat/json-ld

The latest build is published to:

https://ontology-browser.sanderdijkhuis.nl/

sander commented 4 years ago

Update: it turns out a graph DB indeed makes it easier to prepare indices. My work in progress is at:

https://github.com/sander/ontology-linker/blob/feat/datascript/src/nl/sanderdijkhuis/ontology/next.clj