Open poltak opened 5 years ago
Seems to be a timing issue or something like that. I've checked in the Dexie docs whether we can return a promise, but that doesn't seem to be the case. One thing I've noticed though, is that the hooks get a transaction
passed in:
http://memex.link/BEf1lhEHQ/dexie.org/docs/Table/Table.hook(%27updating%27)
Maybe if we make a Dexie-specfic option to the CRUD methods in the Storex backend called dexie.transaction
and we use that one to write to the change log, that changes something?
The weird thing apparently is that delaying the write (console.log
introduced a significant delay) solves the issue. This may mean that we need to queue the backup changes internally. Maybe this also solves the need of having to add the backupChanges
table to every transaction. However, it'd be awesome if we can include the backup change write in the transaction, so we're absolutely sure every modification also gets logged.
Maybe one thing that messes things up is the async stuff. If Dexie continues with it's normal life after the synchronous execution of the hook is completed (meaning something returns a promise), that could introduce the timing issue. That may be a potential source of the problem?
Thanks for looking into this. You got further than me.
Maybe if we make a Dexie-specfic option to the CRUD methods in the Storex backend called
dexie.transaction
and we use that one to write to the change log
What would that option change in the storex backend code?
It definitely feels like a timing issue. I saw that Promises returned from the hook subscriber aren't handled by Dexie, although I don't really understand how that would mean some of them don't get DB access (from what I saw debugging: all subscribers are triggered, and all the DB ops run by the subscribers resolve rather than rejecting).
I tried playing around with the transaction
arg in the subscriber, but there doesn't seem to be much we can actually do with it though apart from lock, unlock and abort. Locking it, running our additional op, then unlocking doesn't seem to change it. Do you know what the docs mean by "A subscriber may use the given transaction object to do additional operations on the database"?
This may mean that we need to queue the backup changes internally. Maybe this also solves the need of having to add the backupChanges table to every transaction.
I thought about this too before, although tried not to in favour of trying to find a simpler solution not involving adding new stuff. Though thinking about it more now, it could be pretty easy to implement. Our old prio queue that we used for all DB ops a while back was just an minor extension of this async queue package: https://www.npmjs.com/package/queue That would work if we still can't find a way to get passed this
I thought one would be able to pass the transaction
object to different modification operations, but it seems I'm mistaken (probably confused it with Sequelize.) That's what I meant by passing in the transaction as an option.
It kinda sucks that the API just stinks: async ops within transactions is something we've been doing for at least 20 years in the SQL world. The only things I can think of now:
1) making sure things happen in-order, avoiding await
s. In theory promises immediately get executed, so it should be possible do have the write be done before we return from the hook. This means that you place a console.log
statement at the beginning and the end of the hook, and inside the createObject function, and see if they indeed execute in the right order.
2) queuing up these things and doing them outside of the transaction, in the hope that no data gets lost.
If reporting a bug:
Originally explained in this slack thread: https://worldbrain.slack.com/archives/CBFJ8V490/p1546572809002800
It effectively means that our backups may not be an accurate representation of the data stored in the DB.
Adding tags method (in bg script console):
backup.recordingChanges = true
addTags = (url, tags) => storageMan.createObjects('tags', tags.map(name => ({ name, url })))
addTags(MY_URL, ['a','b','c','d','e'])
backupChanges
collection contentsReplace
MY_URL
in step 3 with the URL of a page that you have indexed.In step 4, you should see 5 entries with
{..., operation: 'create', collection: 'tags'}
for each tag you added in step 3. However, a lot of the time only 2, 3, or 4 entries are created. This is the issue.All entries get added for each change.
Error stack (from extension crash page)
Other comments
Other ops that modify multiple docs produce similar results where not all changes are recorded, like a multi-delete.
I have spent a bit of time looking into this, but yet to find a reason 🤕
One very weird thing I've noticed is if you add a
console.log
call somewhere in the triggered code (like here: https://github.com/WorldBrain/Memex/blob/master/src/backup/background/storage.ts#L40-L45), then all the changes get recorded :SIf, instead of
console.log
ing, you create some class props inBackupStorage
and mutate them each timeregisterChange
is called and successfully runs itsputObject
call (eg: count the number of calls, or keep track of each call's args), they get set as expected. It's just the data in the DB that ends up in an unexpected state. I originally thought that maybe the Dexie table hooks aren't always getting triggered, but this indicates to me that they are.