each installation of Graphryder API needs its own Neo4j instance. The community edition of Neo4j can’t handle more than one graph at a time. This means that we need to run multiple instances of Neo4j on the same server. This adds a lot of overhead, as Neo4j is a pretty expensive piece of software to run, especially from a memory point of view.
In light of this, I think it is a priority to rewrite the Graphryder API to use the same Neo4j database for multiple instances. This shouldn’t be incredibly difficult. What we need to do is to somehow label every node and relationship as belonging to a certain sub-graph. Another, more “graphy” way of doing it that might be more canonical for Neo4j would be to introduce the new node type “project” and create a “belongs_to” relationship from that project to all its associated nodes. This is is a more memory-efficient way to do it in a graph database since the relationships are all direct memory pointers from the first object, the only search operation is to get a node from a very small set of indexed project nodes.
So basically an equivalent to tablename prefixes in SQL. There are two ways to implement this:
So that Graphryder API is only aware of multiple datasets. It would know how to select the data for the dataset it was configured for, but still one Graphryder API installation could only handle one dataset. So while this is a good first solution, it's still annoying, and we then better go all the way, namely:
So that Graphryder API is capable of handling multiple datasets. We'd then only need one Graphryder API installation for all datasets. For this, the API would have to be extended to also specify the dataset with each request. And the Tulip file naming scheme in ./data/tlp/ would have to be extended so that Graphryder API can cache Tulip graphs from many and not just one dataset there. (There is already a difference between a public and private scheme of naming these files, so maybe the mechanism works just fine for multiple datasets already.)
As reported by @aerugo :+1:
So basically an equivalent to tablename prefixes in SQL. There are two ways to implement this:
So that Graphryder API is only aware of multiple datasets. It would know how to select the data for the dataset it was configured for, but still one Graphryder API installation could only handle one dataset. So while this is a good first solution, it's still annoying, and we then better go all the way, namely:
So that Graphryder API is capable of handling multiple datasets. We'd then only need one Graphryder API installation for all datasets. For this, the API would have to be extended to also specify the dataset with each request. And the Tulip file naming scheme in
./data/tlp/
would have to be extended so that Graphryder API can cache Tulip graphs from many and not just one dataset there. (There is already a difference between a public and private scheme of naming these files, so maybe the mechanism works just fine for multiple datasets already.)