Import should look in the existing data to find target nodes

bjorngranvik / reudd

REgarding User Driven Development - a user centric "data in/report out" web site using Neo4j.

Apache License 2.0

1 stars 0 forks source link

Import should look in the existing data to find target nodes #29

Closed bjorngranvik closed 10 years ago

bjorngranvik commented 10 years ago

Currently the import only looks in the import data itself to find any target nodes (nodes mentioned in a relationship column). This forces a single big import where all the data is present in the same import (file).

We could add a second step when looking up target nodes in the import (see last step in UserController.importFileSubmit). If no target node can be found in the import data itself, then look in the entire database.

This could be expensive but let's solve this for small to medium sized data first.

For instance second step could be: "rel:SOME_REL(firstname)" "joe" Look for any node that has an attribute with key "firstname" and value "joe".

Extended by type: "rel:SOME_REL(sometype.firstname)" "joe" Look for any node of type "sometype" and has an attribute with key "firstname" and value "joe". Type could be index based. This would speed things up and would also mean that attribute would only have to be unique per type (and not in the entire database).

bjorngranvik commented 10 years ago

@niklaslj @matsjonas Your thoughts on this? (I hope it is clear enough above)

matsjonas commented 10 years ago

This is highly connected to what we've been talking about in today's email thread right? I don't want to start two different tracks on the same subject, but wouldn't it be better to keep the discussion here on GitHub?

bjorngranvik commented 10 years ago

I agree with having it here on github. That's a definite goal going forward. This time it happened on email and in swedish and before we knew we were deep into it.

Will write summing up here. /Björn

14 okt 2013 kl. 20:33 skrev Jonas Andersson:

This is highly connected to what we've been talking about in today's email thread right? I don't want to start two different tracks on the same subject, but wouldn't it be better to keep the discussion here on GitHub?

— Reply to this email directly or view it on GitHub.

niklaslj commented 10 years ago

If it is important to keep the possibility of only relate nodes contained in the importfile I suggest that it is controlled by a switch. The only-internal state would then apply to all "rel:(" in the import file. The switch would be accessable through a checkbox in the user-interface.

bjorngranvik commented 10 years ago

I like your idea. As we go forward we figure out if and how we need a configuration to control rel differently.

/Björn

15 okt 2013 kl. 11:06 skrev Niklas Ljungkvist:

If it is important to keep the possibility of only relate nodes contained in the importfile I suggest that it is controlled by a switch. The only-internal state would then apply to all "rel:(" in the import file. The switch would be accessable through a checkbox in the user-interface.

— Reply to this email directly or view it on GitHub.

matsjonas commented 10 years ago

The import now looks through all the existing nodes when adding relationships to imported nodes. Since the nodes from the import has already been added when this step takes place, they will automatically be included in the search for possible relationship endpoint candidates.

niklaslj commented 10 years ago

Perfect!

bjorngranvik commented 10 years ago

Sweet!