Closed e0 closed 4 years ago
There are multiple routes to deal with this:
a) use call db.indexes() yield ..
to find the indexes to drop
b) use drop database
- an enterprise feature, but maybe we can get that for free
c) in the docker build stage, before starting up neo4j, clear the database folder
To me, the cleanest and easiest sounds (c) - it would guarantee that we always get a fresh start when doing an import.
In the build stage for production, it always starts from a fresh neo4j image so the database would always be fresh/reset. The problem here is that trying to call DROP INDEX...
becomes an invalid statement if no such index exists. I think the output file should prioritise production and not include and DROP INDEX...
by default.
For the development environment, I don't think we should even include the import db step in the build step. It feels reasonable to copy and paste the cypher instructions manually into the neo4j browser.
My bad I forgot about this problem, I would, myself, make it more compatible for every situation. The import script implies to add new data to the - empty or not - neo4j database, meaning indexes should be systematically (re-)created, so be systematically dropped to ensure it. I would opt for option a)
or a more simpler equivalent option if possible (in SQL there is an easy way to solve this problem with the DROP INDEX IF EXISTS statement).
Perhaps apoc.schema.assert
is a good alternative to (a).
I'm really up for having the same way of importing in production and in development, even if it would increase the build time for development. From what I see, building is not required that often for development thanks to hot reloading in both frontend
and api
.
I think the build time to add the data import would be pretty negligible. Is there any reason why you would want to have the same setup?
I was thinking for development we might want to experiment with different data frequently enough so it's good to encourage the usage of the neo4j browser. It feels like rebuilding the stack every time we want to import different data is unnecessarily more time consuming. Of course one could still use the neo4j browser but I personally would prefer enforcing a single method of importing the data.
I don't feel too strongly about this though. Just want to provide my reasoning here. Feel free to respond with a decision on this and I can make the changes accordingly.
The same setup brings simplicity (no "if" involved), and robustness (guarantees we test this same method thoroughly by using it routinely in development, so we can pick up any problems that might surprise us during deployment). And by "same setup" I mean minimizing as much as possible the differences between development and production, pretty much as you say @e0 :
a single method of importing the data
Please take all necessary steps for simplicity and robustness.
Could not get the apoc to work so, for the time being, I have removed the drop index instructions by default. They can be added with the --drop-indexes
arg.
commit bbb4cab6361ac51d47d65c0461b183f4badf2a40
To clarify, the "single method" I mentioned above didn't imply the same method for deployment and production. I meant that by having the import during build step, in addition to the possibility of importing using the neo4j browser, there are two methods available of importing data for the development environment. I do like your point on the chance to pick up deployment problems though.
Thank you all for the discussion on this topic. Good that you added the new commit now as well @pecholleyc. With this, I think we can close this issue and I can make the changes to deployment build in the other repo.
Since statements like
DROP INDEX ON :Metabolite(id);
is invalid if the index is not already created, the rest of the cypher query would not run.As it already says in the TODO inside
index.js
, let's add a command line option to include theDROP INDEX
statements dynamically. But I think it is better that by default they are not included.