bjorngranvik / reudd

REgarding User Driven Development - a user centric "data in/report out" web site using Neo4j.
Apache License 2.0
1 stars 0 forks source link

Make imports trackable #32

Open niklaslj opened 10 years ago

niklaslj commented 10 years ago

To me it is important to be able to track the imported data in the database. Both for tracability and to make imports undoable.

I suggest that a new base node is added: _REUDD_IMPORTS Each import creates an import collection node (_REUDD_IMPORT) related to the above containing attributes for importfilename and a timestamp (anything else, a free text from the user maybe?). All imported nodes will then relate to this import collection node. Unfortunatly is it impossible to hava a relation to another relatation so to keep track of the imported relations I suggest an attribut (ie _REUDD_IMPORT_NODE) is added to all imported relations containing the node id of the import collection node (or some other reference).

I also suggest a checkbox in the user inteface for the import so the user can decide if he want to add import traces or not for each import file.

I guess this is rather simple to add to the import function and it would of very much use to me. Otherwise I have to handle this manually in every import by adding an import node in each file and then add relations to it in each row of the file. But I can't see any way of tracing relations manually, or?

matsjonas commented 10 years ago

I like this idea. We should probably add a sub reference node for all import nodes as well.

Difficult problem on how to connect the relationships to the import node. Will have to think about that one for a while.

niklaslj commented 10 years ago

Glad you liked the idea!

I like this idea. We should probably add a sub reference node for all import nodes as well.

You don't think that the _REUDD_IMPORTS node I suggested would work? Or are you talking about somthing else?

Difficult problem on how to connect the relationships to the import node. Will have to think about that one for a while.

My suggestion was to add an "hidden" attribute to the imported relation ie _REUDD_IMPORT_NODE_ID containing the import collection nodes id. Do you think that is a bad idea?

If you don't want to add anything to the relation you could add an extra node related to the import collection node containing the imported relations id. Will be a bit clumsy perhaps but you don't touch the imported relation.

It's hard to put this in text, I think I have to draw a graph to describe the idéa :)

matsjonas commented 10 years ago

You don't think that the _REUDD_IMPORTS node I suggested would work? Or are you talking about somthing else?

Sorry. Seemed to have missed that you wrote that. It was exactly what I meant.

... add an "hidden" attribute to the imported relation ie _REUDD_IMPORT_NODE_ID containing the import collection nodes id ...

This is perhaps the best solution. Simple, direct, fast and easily understandable.

What happens after a while when the graph has changed? The user can rename, delete and basically do whatever s/he wants. So after a while we might end up with a whole new content to our imported nodes but with lingering hidden attributes and relationships to import nodes that doesn't matter anymore.

Is this a problem, or is it just something we will have to accept? Should we update import attributes/relationships upon changes to the nodes?

niklaslj commented 10 years ago

Hm, didn't thought of that... A tricky one. You could add an complex set functions to track changes since the import but I don't think it is worth the cost. But it would be a great thing to be able to determine if a node has been changed after the import or not. I just took a quick peek at a node in ReUDD's database and noticed that you clever guys have already have the _REUDD_LAST_UPDATE attribute in the nodes. And just by comparing the import date with the _REUDD_LAST_UPDATE you could determin that the node has been changed after the import. Then, if you want, you can find what was changed by comparing the data in the import file and the node. (It would actully be enough to compare _REUDD_LAST_UPDATE with _REUDD_CREATED in a imported node to know if has been altered after import). This way no extra work or info has to be added. Do you think this would work? It doesn't cover relations as there are no recording of create/changes in the relations.

matsjonas commented 10 years ago

Good plan. I like it.

niklaslj commented 10 years ago

Thanks! Some additional thoughts:

May I suggest an solution to be able to detect if any imported nodes/relations are deleted? Add two attributes to the import collection nodes (_REUDD_IMPORT): imported_nodes_count and imported_relations_count. By knowing the number of nodes/relations imported it is easy to detect (if ever needed) if any nodes/relations have been deleted after the import. Again, if anyone at a later stage want to know which nodes are deleted one can analyze the import file and compare it with the current graph.

I do think that most users doesn't need the traceability apart from the possibility to undo an import action more or less immediately after the import. And I think the solutions above will cover that need quite easily.

As an addition I would suggest that the following information is made available to the user somewhere in the GUI for each import action: Importdate: 2013-10-29 14:01:02 Filename: myimport.csv Description: All names in the calendar Nodes imported: 318, changed after import: 18, deleted after import: 3 Relations imported: 201, changed after import: ?, deleted after import: 15 (Undo import) <- button to perform an undo of the import

It might also be of interest to add an marker/icon in the GUI for each imported node/relation indicating the two states: imported / imported and changed