Closed Rkyzzy closed 2 years ago
https://www.figma.com/blog/how-figmas-multiplayer-technology-works/ https://www.figma.com/blog/behind-the-feature-autosave/ A company called Figma has its CTO and engineering posting these two blogs that states the problem clearly, and their app's collaborative property is very much the same with Texera, I read all these two articles , they explain and Problem pretty well with vivid video and stuff, but I don't actually understand its final solution for the problem, a little bit vague. Think I need to revisit and try to understand it and find out whether it is suitable for our system.
https://en.wikipedia.org/wiki/Operational_transformation#Critique_of_OT https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type#G-Counter_(Grow-only_Counter) It seems that there are currently two possible solution for this scenario Two wikipedia links regarding to this topic, read part of it, still investigating ,found many useful paper links under these links
Operational transformation (OT) is a technology for supporting a range of collaboration functionalities in advanced collaborative software systems. OT was originally invented for consistency maintenance and concurrency control in collaborative editing of plain text documents. Its capabilities have been extended and its applications expanded to include group undo, locking, conflict resolution, operation notification and compression, group-awareness, HTML/XML and tree-structured document editing, collaborative office productivity tools, application-sharing, and collaborative computer-aided media design tools. In 2009 OT was adopted as a core technique behind the collaboration features in Apache Wave and Google Docs.
A conflict-free replicated data type (CRDT) (For distributed system) is a data structure which can be replicated across multiple computers in a network, where the replicas can be updated independently and concurrently without coordination between the replicas, and where it is always mathematically possible to resolve inconsistencies that might come up
https://www.youtube.com/watch?v=3ykZYKCK7AM Watched this short talk of engineer of Google Wave, it seems that for a server-client model, to solve the issue we are having, it needs Operational transformation on the server, so need to investigate more on the OT and its relevant algorithm.
TODO: Watch these two Google Tech Talk 1.Issues and Experiences in Designing Real-time Collaborative Editing Systems https://www.youtube.com/watch?v=84zqbXUQIHc 2.Differential Synchronization https://www.youtube.com/watch?v=S2Hp_1jqpY8
TODO: https://research.google.com/pubs/archive/35605.pdf Read Google's paper about Differential Synchronization
https://softwareengineering.stackexchange.com/questions/202815/how-to-save-during-real-time-collaboration Very useful stackexchange question regarding this issue: It suggests two way of implementing this:
TODO: Implement single user saving strategy and restoring operation.
Libraries available regarding this issue will be posted following and be updated for further evaluation.
Library1 GSON (UPDATING) Link to library: https://github.com/google/gson, documentation
Provider: Google
Stars: 19.9k
Maintaining status: Still updating and maintaining
Brief introduction: It converts Java objects into JSON.
How it help our task :It does the versioning using @Since
annotation that it customized on Classes, Fields and, in a future release, Methods, etc.
Advantage:
Disadvantage: functionality not clear(e.x. We don't know whether it can handle nested json object)
Demo usage and Expected output: can be found here demo usage, tutorial ......
Library2 Json-Version-Control (UPDATING) Link to library: https://github.com/datoMarjanidze/json-version-control
Provider: datoMarjanidze
Stars: 3
Maintaining status: Stop updating since 3 years ago
Brief introduction: It's a small npm library for json version control. It provides solution to two task: 1) Store updated information 2) Catch differences (propertie creation, deletion & value modificataions).
How it help our task: It provides two key function to do the job. 1)createHistoryObject
which takes in two parameters(versionNumber (Number)
, predecessorObject (Object)
, currentObject (Object)
) 2) restoreHistoryObject
which takes in three parameters ( historyObjects (Array)
, currentObject (Object)
, options (Object)
) details can be checked at its usage
Operation Complexity: To be analyzed.
Advantage: It really satisfy our need: it does the diff versioning and it calculates the deep difference of (nested) json object which most libraries cannot do. In case of restoring(checkout), it has the spec like this which satisfy our user need.
Disadvantage: Whether we can trust the library is doubtful. The library has little stars on github, and there is no maintaining and updating since three years ago. It needs to be verified.
================================================================================== Testing : Finished a basic trial using it over our workflow, data is a real workflow in a production environment, changes like moving operators' position, adding/deleting an operators, change operator properties, change linking status between operators are tested. Hardcoded 8 demo workflows with these gradual changes.(Same test as for library 3)
Result: It can correctly perform the operation we want, that is, commit a version(using its createHistoryObject
to perform diff and store it) , list all the version, checkout to a specific version (using its restoreHistoryObject
).
Drawback: 1) It has the problem library3 has, which is the nested array problem, what is even worse is that, for some unknown reason, for a modification in our workflow, even if I didn't modify the operatorPositions
part and breakpoints
part, these two will always be stored into a change.(which I haven't figured out why, maybe is the library's implementation issue) 2)its restoreHistoryObject
only provide "merge-until-reach-the-oldest-version" feature, to get a specific certain version we want, we need to manually get the portion of the changes that we want it to be merged at.
Comments: A general comments is that, it can does the work, but it has the same problem as library 3 and even worse, combining the repo's activeness (3 stars and not updating), I suggest we do not use this as our versioning library.
......
Library3 node-rus-diff (UPDATING)
Link to library: https://github.com/mirek/node-rus-diff
Provider:mirek
Stars: 116
Maintaining status: Last update on Oct. 2020
Brief introduction: (R)emove-(U)pdate-(S)et JSON diff library can be used standalone to compute difference between two JSON objects.
How it help our task: It provides the tool for comparison and diff between json object, which is a key step when we do the diff storing. (It contains functionality that is like a subset of the above library, the diff part) . The diff and apply operation example of this library can be checked here.
Operation Complexity: To be analyzed.
Advantage: It is a well wrapped library tool for json diff which is the key step for our task, and it is verified by many people.
Disadvantage: It has three remaining issue as the developer suggested: 1) It will not dive into nested arrays; 2) Whether array will be compared as ordered or unordered set is hard to specify. 3) The code is written in dated coffee script, which is not as good as a ts implementation
==============================================================================
Testing: Finished a basic trial using it over our workflow, data is a real workflow in a production environment, changes like moving operators' position, adding/deleting an operators, change operator properties, change linking status between operators are tested. Hardcoded 8 demo workflows with these gradual changes.
Result: It can correctly perform the operation we want, that is, commit a version(perform diff and store it) , list all the version, checkout to a specific version (apply changes to the latest version gradually backwards). Correctness guaranteed by assert.deepStrictEqual
Some drawbacks I found after testing : 1. When it comes nested array in our workflow, for example, the "operators" part, its way of storing diff is store the entire array(It takes the array as a whole, even if a tiny bit part in the array changes, it will still store the whole array), which is not that optimal. 2. As the Disadvantage(2) mentioned, it has trouble whether it should compare the array as a ordered or an unordered set. I'm not quite sure whether Texera's workflow's nested array part's order matters or not.
...
Library4 jsondiffpatch
Link to library: https://github.com/benjamine/jsondiffpatch
Provider: benjamine
Stars: 3.8k
Maintaining status: Still maintaining
Brief introduction: A javascript library that can perform json's diff/patch/unpatch operation
How it help our task : It can help our task as the above library do, do json diff to produce the delta change of the workflow to save and patch and unpatch to restore and check out to certain version.
==============================================================================
Testing: Finished a trial using it over our workflow, data is a real workflow in a production environment, changes like moving operators' position, adding/deleting an operators, change operator properties, change linking status between operators are tested. Hardcoded 8 demo workflows with these gradual changes.
Result: For correctness, it performs good. It can perform diff over two workflow in a reasonable way, also it has the function of bothpatch()
and unpatch()
to either apply the delta forward or backward. For performance evaluation, its way of storing the changes outperforms the above libraries( json-version-control
and node-rus-diff
) because it can handle the nested array well that it won't store unnecessary information (For example, same case of changing a workflow operator's property, it will only store the changed operator instead of the whole list of all operators) and the format of its changes is pretty clear to me. It also provides some utility function such as deepclone of a json object.
The format of changes it stores: This is the delta produced after modification of the 'limit' property of the limit operator from 2 to 3.
For more trial, it has a live demo online that you can try on here
Short comment about this library: This library is worth trying in my opinion, it can fulfill our requirement and guarantee both correctness and performance, also it has support and its license is MIT license.
Recent work regarding comparison of different storing strategy, basic implementation design, and library evaluation and its performance test: versioning_design_and_testing.pptx
Have some frontend change for the versioning part
Discussion 1/13: closed it as we implemented it for a single user case for multiple versions.
Currently we can only perform the undo redo service on the frontend and it is cached only in the browser, which means that after closing the browser or refreshing it the previous saved version of workflow will disappear. In the backend, it uses the endpoint
PersistWorkflow
to update the workflow, for auto-save or saving by the user, it only does the replacing/updating in the backend, we will lose previously saved version and cannot restore, we want it to be changed to having multiple version of previous workflow stored and have certain kind of version control over merging these workflow histories when it comes to an async multi-user scenario Todo is to first explore the easier case, which is the system without user system--One user, Sequential. Then, Some research should be conducted for similar cases of the harder async multi-user scenario. Similar implementation like Google Doc can be investigated for this purpose. Progress will be posted under this issue.versioned autosave workflow
Autosaving a workflow while maintaining the previous versions means this setting
Next step is to follow the 5 steps above to do the version control. similar to Google's Autosave and Chrome's Autosave. Since our workflow is internally represented as a JSON object. Tools to compute the difference and apply patches are listed below.