InformaticsMatters / portal

lac portal
Apache License 2.0
0 stars 0 forks source link

Issue with duplicate cell IDs #25

Open tdudgeon opened 8 years ago

tdudgeon commented 8 years ago

There is issue with cell IDs that is may be significant. Currently each canvas has its own last cell ID that is used to determine the ID of a new cell that is added to that canvas. The problem is that when you create a new editable (e.g. by creating savepoints) each editable has its own last cell id that is independent of that for other editables which will result in different cells from different editables having the same ID. I don't think this currently has any major impact as variables and stored/retrieved using a combination of the editable/savepoint ID and the cell ID, but its possible that there could be some subtle quirks that result from this (e.g. as you look back up the version tree for a variable), and it seems like bad practice, so it seems like it should be avoided.

To fix it there would seem to be 2 obvious solutions:

  1. make the last cell ID global to the notebook. Whilst seeming attractive this would mean that this value would have to be persisted separately at the notebook level whenever a new cell was added and there would be complex cases when multiple users are working on the same notebook.
  2. generate the cell ID as a random number (random Long). Probably this is the best solution as it avoids the concurrency issues, avoids the need for anything to be persisted, and there is an almost zero chance of 2 cells being assigned the same ID.

Unless someone can think of a better solution ...

gustavosantucho commented 8 years ago

I personally wouldn't use a random generator for this. Any chance we could use the database? That would take care of all the issues mentioned, plus DBs have been doing this stuff for ages.

tdudgeon commented 8 years ago

We could use the database (that's option 1 that I mentioned) but it would mean that we would need to update the notebook DTO every time an cell was added (and do that as a round trip prior to the cell being actually added) and then to update the editable DTO once the cell was added. It could be done, but seems pointless as the cell ID has absolutely no purpose other than needing to be unique.

gustavosantucho commented 8 years ago

I don’t understand why we wouldn’t need to update the notebook… are you thinking about manually generating the values? Any ID generation strategy would guarantee unique IDs without additional round trips. Like you said, Random doesn’t guarantee unique-ness, but if you are comfortable enough with “almost zero chance” of duplicates, then you can use it. Note: I now remembered we don’t use JPA at the persistent layer, which would also solved this easily even without going to the DB every time. But even without JPA, something like a simple database sequence would do?

On Jun 7, 2016, at 8:48 AM, Tim Dudgeon notifications@github.com wrote:

We could use the database (that's option 1 that I mentioned) but it would mean that we would need to update the notebook DTO every time an cell was added (and do that as a round trip prior to the cell being actually added) and then to update the editable DTO once the cell was added. It could be done, but seems pointless as the cell ID has absolutely no purpose other than needing to be unique.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/InformaticsMatters/portal/issues/25#issuecomment-224256936, or mute the thread https://github.com/notifications/unsubscribe/AA2J8_ZRvpUOIHrDDObajnUCW7tjdlFKks5qJVqagaJpZM4Iv0Lr.

tdudgeon commented 8 years ago

The last cell id would now effectively be a property of the notebook, not the editable so that would be where it would logically be stored. Cells are not their own domain objects at eh persistence level - they exist as part of an editable/savepoint. It was that editable/savepoint that has been (up to now) storing the last cell id. Now we need to move it to the notebook. And generally I prefer not to use sequentially generated identifiers unless there is a specific reason why they need to be in sequence, as it adds to the security as you can't easily guess what an identifier would be.