Yikes! Potential data corruption?

a-type commented 1 year ago

I wish I knew more about this. Since it's in my local development environment it's possible I was just resetting and screwing around with things too much.

But there were definitely snapshots being written which appeared invalid, missing parts of their objects, but with no operations in the history to add the missing parts. This was causing corruption on clients which couldn't write those objects to storage because they were missing primary keys.

Very not good if this happens in the wild.

Possible next steps since I don't have reproduction:

[ ] Add a 'dev mode' for the server which preserves operation history. This will help look backward if this happens again and see whether operations were skipped during rebase?
[x] Add a fuzz test suite to try to brute force this again. Should include: a. Just doing random ops b. Disconnecting and reconnecting clients c. Joining new clients with existing data for the library but no sync history*

*: this is the most likely cause that I can think of! Happens a lot in my local environment because of resetting databases.

a-type commented 1 year ago

[x] Another step: on the client, handle and log the invalid write error and delete the relevant row if it exists. This will treat any corrupted data as deleted for querying's sake, which should cause a re-initialization in theory. Not good hygiene but it is a much better user experience than sync failiing.

a-type commented 1 year ago

Fuzz test strategy: just toss in one collection with one field that's any. Traverse it, pick an arbitrary key, assign an arbitrary data (either primitive, object, or list). Traverse, pick an arbitrary key, mutate what's there (if it's a list, randomly do a list op too).

At intervals, take snapshots from all clients and compare for deep equality.

a-type commented 1 year ago

Fuzz testing is working nicely to clean up a lot of corner cases so far.

[x] One thing I want to document so I don't forget: ents which are otherwise lists are being turned into objects. I suspect due to concurrent set conditions?

I.e. if

A sets foo as a list
B sets foo as an object
A sets 0 on foo
Changes sync

Then B's initialize wins, but A still treats it like a list! Yikes!

This only applies to any but is still dangerous and I need to figure it out...

a-type commented 1 year ago

Ok, so I think the real problem here is not so much applying different data patches, but the actual class instantiation of ListEntity vs. ObjectEntity.

In retrospect, having separate classes for these feels too inflexible to support any as a field type. The Entity would have to detect its transmogrification and re-instantiate itself as the other class!

[x] Since typings are all generated anyway... why not have one Entity class and just use type definitions to narrow it for the end user?

a-type commented 1 year ago

Ok, I believe I've knocked out the root cause here.

I could take the fuzz testing even further, start randomly deleting databases, etc. But for now I'm happy.

a-type / verdant

Yikes! Potential data corruption? #101