ETS-Next-Gen / writing_observer

Writing Observer and Learning Observer: A system for monitoring learning process data, with an initial focus on writing process data from Google Docs.
https://learning-observer.readthedocs.io/en/latest/
GNU Affero General Public License v3.0
11 stars 9 forks source link

Possible Data Corruption Bug. #16

Open DrLynch opened 3 years ago

DrLynch commented 3 years ago

I noticed this evening when testing with the latest extension that when I switch from one document to another the log information retains text from the prior documents. For example, when working with a test account I first opened up a blank doc and typed 'Hippidy skip' I then opened up the shared assignment doc in the google classroom and typed some other random gibberish, and then turned to a third doc with new data. Yet the running log contents appear to show all of the file text being added together in the same record:

2021-08-19T03:30:39.680398: pipeline/handler/incoming_websocket_handler Processing PubSub message google_docs_save from org.mitros.writing-analytics [<function pipeline..process at 0x7fa45dac9ca0>] {'text': 'Testing eiyh TeThis is pe\n\n\n\nA new thaing baby big fun baby\n\n\nHippidy \n\nsdfaskd\n\nWe atrdoing \n\nA thing', 'position': 16, 'edit_metadata': {'cursor': [9, 10, 11, 18, 46, 47, 49, 50, 51, 53, 55, 57, 59, 60, 61, 63, 65, 67, 68, 2, 3, 4, 5, 7, 9, 11, 12, 7, 10, 13, 14, 16, 18, 20, 21, 22, 24, 25, 28, 30, 32, 34, 36, 37, 38, 39, 3, 8, 11, 14, 15, 16], 'length': [8, 9, 10, 17, 18, 19, 21, 22, 23, 25, 27, 29, 31, 32, 33, 35, 37, 39, 40, 41, 42, 43, 44, 46, 48, 50, 51, 47, 50, 52, 53, 55, 57, 59, 60, 61, 63, 64, 67, 69, 71, 73, 75, 76, 77, 78, 80, 85, 88, 101, 102, 103]}}

I have yet to investigate whether this affects other log data or the pub-sub but it is showing up in the log.

DrLynch commented 3 years ago

Update: Restarting the server and opening a new doc leaves the same issues.

{'headers': {}, 'source': 'org.mitros.writing-analytics', 'auth': {'sec': 'unauthenticated', 'user_id': 'gc-115680958658335122169', 'safe_user_id': 'gc-115680958658335122169', 'providence': 'gcu'}} 2021-08-19T03:33:01.337731: incoming_websocket_handler/auth_middleware/factory Dispatch incoming ws event: keystroke 2021-08-19T03:33:01.343308: handler/incoming_websocket_handler/auth_middleware Compiling event for reducer: keystroke 2021-08-19T03:33:01.349002: pipeline/handler/incoming_websocket_handler Processing PubSub message keystroke from org.mitros.writing-analytics [<function pipeline..process at 0x7f3dcca50e50>] {'text': 'jjTesting eiyh TesThis is pe\n\n\n\nA new thaing baby big fun baby\n\n\nHippidy \n\nsdfaskd\n\nWe atrdoing \n\nA thing', 'position': 3, 'edit_metadata': {'cursor': [9, 10, 11, 18, 46, 47, 49, 50, 51, 53, 55, 57, 59, 60, 61, 63, 65, 67, 68, 2, 3, 4, 5, 7, 9, 11, 12, 7, 10, 13, 14, 16, 18, 20, 21, 22, 24, 25, 28, 30, 32, 34, 36, 37, 38, 39, 3, 8, 11, 14, 15, 16, 17, 3], 'length': [8, 9, 10, 17, 18, 19, 21, 22, 23, 25, 27, 29, 31, 32, 33, 35, 37, 39, 40, 41, 42, 43, 44, 46, 48, 50, 51, 47, 50, 52, 53, 55, 57, 59, 60, 61, 63, 64, 67, 69, 71, 73, 75, 76, 77, 78, 80, 85, 88, 101, 102, 103, 104, 106]}} {'headers': {}, 'source': 'org.mitros.writing-analytics', 'auth': {'sec': 'unauthenticated', 'user_id': 'gc-115680958658335122169', 'safe_user_id': 'gc-115680958658335122169', 'providence': 'gcu'}} 2021-08-19T03:33:01.337731: incoming_websocket_handler/auth_middleware/factory Dispatch incoming ws event: keystroke 2021-08-19T03:33:01.343308: handler/incoming_websocket_handler/auth_middleware Compiling event for reducer: keystroke 2021-08-19T03:33:01.349002: pipeline/handler/incoming_websocket_handler Processing PubSub message keystroke from org.mitros.writing-analytics [<function pipeline..process at 0x7f3dcca50e50>] {'text': 'jjTesting eiyh TesThis is pe\n\n\n\nA new thaing baby big fun baby\n\n\nHippidy \n\nsdfaskd\n\nWe atrdoing \n\nA thing', 'position': 3, 'edit_metadata': {'cursor': [9, 10, 11, 18, 46, 47, 49, 50, 51, 53, 55, 57, 59, 60, 61, 63, 65, 67, 68, 2, 3, 4, 5, 7, 9, 11, 12, 7, 10, 13, 14, 16, 18, 20, 21, 22, 24, 25, 28, 30, 32, 34, 36, 37, 38, 39, 3, 8, 11, 14, 15, 16, 17, 3], 'length': [8, 9, 10, 17, 18, 19, 21, 22, 23, 25, 27, 29, 31, 32, 33, 35, 37, 39, 40, 41, 42, 43, 44, 46, 48, 50, 51, 47, 50, 52, 53, 55, 57, 59, 60, 61, 63, 64, 67, 69, 71, 73, 75, 76, 77, 78, 80, 85, 88, 101, 102, 103, 104, 106]}}

DrLynch commented 3 years ago

Confirmed. After a purge of initial data and a reset to clear the pubsubdb it is clear that the reconstruction data stored in the redis set does not appear to distinguish between document sources. As a consequence the reconstruction does not separate when students switch docs.

pmitros commented 3 years ago

This is not a bug, but an unfinished implementation. The system ought to support having reducers across a broad set of keys, e.g.

  1. Per student
  2. Per course
  3. Per item
  4. Per student / item
  5. Per course / item
  6. Etc.

XBlocks field scopes are an example of a similar design. Right now, we generate KVS keys just per-student. We need to support all of these. It's an interesting engineering project.