PrivateStorageio / ZKAPAuthorizer

a Tahoe-LAFS storage-system plugin which authorizes storage operations based on privacy-respecting tokens
10 stars 7 forks source link

Storage-time not fully redeeming to ~50GBs-months on a new voucher #432

Open elric-wamugu opened 2 years ago

elric-wamugu commented 2 years ago

Summary

Each voucher should give a user ~50GBs-months of storage-time once redeemed. However, in some instances, the storage-time does not fully update its balance till 50GBs-months on the GUI client (Gridsync).

Restarting the GUI client sometimes triggers the storage-time update which then continues to update the available storage-time till the expected ~50GBs-months that each redeemable voucher grants.

Screencast showing storage-time not fully redeeming to ~50GB-months on a new voucher: Storage-time not fully redeeming.webm

Debug log: Gridsync debug information - Storage-time not fully redeeming.txt

Screenshot showing storage-time updating after the GUI client relaunch: Screenshot from 2022-09-12 14-55-33

However, relaunching the GUI client does not always trigger the storage-time balance to update: Screencast from 09-12-2022 02:35:52 PM.webm

Test Environment

Hardware info: VirtualBox Version 6.1.34 (Qt5.15.3) OS Version: MacOS 12.5 & 11.6.8

crwood commented 2 years ago

Thanks for filing this, @elric-wamugu. The relevant error here appears to be this one (written to Tahoe-LAFS' stdout):

>>> 2022-09-12T06:43:56-0700 [_zkapauthorizer.controller.PaymentController#critical] Redeeming random tokens for a voucher (b'Xt-8_DIczIyXWuUT3wn6d9Sm21GBgn3q7KwsKteZx6MD') encountered error.
>>>     Traceback (most recent call last):
>>>       File "twisted/internet/defer.py", line 1750, in gotResult
>>>       File "twisted/internet/defer.py", line 1737, in _inlineCallbacks
>>>       File "twisted/internet/defer.py", line 661, in callback
>>>       File "twisted/internet/defer.py", line 763, in _startRunCallbacks
>>>     --- <exception caught here> ---
>>>       File "twisted/internet/defer.py", line 857, in _runCallbacks
>>>       File "_zkapauthorizer/controller.py", line 933, in _redeem_success
>>>       File "_zkapauthorizer/model.py", line 223, in with_cursor
>>>       File "_zkapauthorizer/model.py", line 574, in insert_unblinded_tokens_for_voucher
>>>       File "_zkapauthorizer/model.py", line 581, in _delete_corresponding_tokens
>>>       File "_zkapauthorizer/replicate.py", line 521, in execute
>>>     sqlite3.DatabaseError: database disk image is malformed
>>> 2022-09-12T06:43:56-0700 [_zkapauthorizer.controller.PaymentController#info] Temporarily suspending redemption of b'Xt-8_DIczIyXWuUT3wn6d9Sm21GBgn3q7KwsKteZx6MD' after non-success result.

Did you happen to be able to reproduce this under any other environment(s)?

exarkun commented 2 years ago

Is this environment virtualized? If so, what VM manager is being used? And, is the ZKAPAuthorizer database on a shared filesystem of some sort (shared between the VM and the host, in particular - but any kind of sharing is interesting)?

elric-wamugu commented 2 years ago

@exarkun I have updated the issue description to include the testing environment.

The VM has its own dynamic virtual disk image (VDI) that is exclusive to that VM only so not shared between other file systems

exarkun commented 2 years ago

@meejah @crwood and I tried and failed to reproduce on macOS in VirtualBox. We don't really have a plausible theory for what's going on here except for the runtime environment corrupting the data between the virtual disk and the SQLite3 library.

One thing we noted is that database read operations immediately following the failed write operation succeeded. Either those operations don't stumble over the problem area the write encounters (I don't know if this is plausible given SQLite3's mode of operation) or the corruption is transient (which would lend credence to the "flaky VM" explanation, I think).

It would be great to be able to look at the database file immediately after this happens, or even better to be able to look at the whole VM image.

@elric-wamugu Can you keep an eye out for this happening again and if it does, shut down GridSync and share both the debug logs and the whole Tahoe-LAFS client node directory?