ArgLab / ArgLab_writing_observer

Writing Observer and Learning Observer: A system for monitoring learning process data, with an initial focus on writing process data from Google Docs.
GNU Affero General Public License v3.0
3 stars 2 forks source link

Merkel Tree Storage Update. #14

Open DrLynch opened 2 years ago

DrLynch commented 2 years ago

Update the implemented code with the Merkel tree implementation. See the documents: "Write Yourself a Git." Then build out a documented design plan for implementing the data structure in the platform by early next week and we will discuss the approach.

DrLynch commented 2 years ago

MerkelFigure

BenBeehler commented 2 years ago

Dr. Mitros and I had a good conversation around two weeks ago which contributed to my research report and its corresponding high-level implementation:

A high-level implementation of a Merkle DAG in Writing Observer ought to consider the following object-types within the system: events, sessions, reducers, aggregators, students, and classrooms. Moreover, a proposed high-level design should consider that the following primary difference between Git and WO is volume. Git repositories no doubt can experience a high degree of volume insofar as activity by developers within a given repository contributes towards its volume frequently. But the difference conceptually is that WO should be designed under the assumption of a constant, very high volume set of requests by students indirectly through websocket sessions and events. Because of this, it would be inefficient to build a system that creates a blob-style object for every single incoming student event (key-stroke event) since this would necessitate a significant performance penalty. Instead, the system should create a cached session object using a tool such as Redis and then append a hashed event object to the cached session object. When the session then expires, the system should store the session object on the disk, just as Git stores a blob for each file in the repository. When the session object is finally instantiated and stored on the disk, the system should also supply it with a header section that details basic student information, timestamp, parent hash of all event hashes, cryptographic salt information, and the hash of the previous student-session parent. This record of the previous hash allows for the construction of a linked-list-per-student structure within the tree, which can allow for data validation against any particular session if needed. Moving up the tree, a parent hash of all session hashes can be used to construct a general student hash, which changes upon generation of new session blob objects alongside actual corresponding student session data. Finally, these individual student hashes can be used to construct individual classroom objects, which also contain parent hashes. With this structure fully elucidated, one can finally recognize how problems (a) and (b) are conceptually solved.

My proposed implementation would exist along these lines and modularly integrate with the Learning Observer module as it currently stands. The module would have functions that could construct session blob objects from a list of its log objects and header information (salts, student names, etc.), commit-esque student objects that refer to the session linked lists, and classroom objects that reference the students in any particular classroom. All hashes would utilize the SHA-256 algorithm. Separate paths would exist for each object type (sessions, students, classrooms).