EBISPOT / ols4

Version 4 of the EMBL-EBI Ontology Lookup Service (OLS)
http://www.ebi.ac.uk/ols4/
Apache License 2.0
39 stars 17 forks source link

Neo4j 5 #346

Open jamesamcl opened 1 year ago

jamesamcl commented 1 year ago

Seems like we've been doing this for long enough for there to be a new major release of Neo4j!

Need to work out what's changed and how easy the upgrade path will be, lest we end up in the situation with OLS3 where our database is 7 years out of date and can't be easily upgraded.

jamesamcl commented 1 year ago

I had a play and seems like neo4j-admin import (which is now neo4j-admin database import full in Neo4j 5) is hanging forever when I try to use it on the cluster; further investigation required but may be some issues there.

giraygi commented 1 year ago

The biggest contribution of neo4j 5 will be being able to use the incremental mode of neo4j-admin import (neo4j-admin database import incremental) so that it will be possible to ingest ontologies one (or some) at a time.

jamesamcl commented 1 year ago

Thanks @giraygi , I wasn't aware and that's super interesting.

For future reference (mainly for @henrietteharmse) I think it would be a fairly major change in OLS4 to go back to incremental loads, as although it would be trivial to hash owl files and consider the hashes when loading ontologies, we would also need to consider the linker which would potentially have stale outputs (as new links would not be discovered/old links would remain).

(The only place incremental loads would make a difference would be in loading things into the DBs which is the current bottleneck, so there would be no real advantage to doing this earlier in the pipeline.)

We could mayyybe hash our generated files from ontologies after data generation and prior to loading into the DBs, rather than hashing the owl files?