Stanzilla / WoWUIBugs

World of Warcraft UI Bug Tracker
169 stars 7 forks source link

Suggestion to improve SavedVariables writing process #241

Closed mrbuds closed 1 year ago

mrbuds commented 2 years ago

Issue


It was reported that some addon developers frequently receive reports by their users of addon "savedvariables" data loss.

I don't have a reproducible test case as it seems to depend on factors like cpu or hard drive speed, size of data, and how they exit World of Warcraft (Alt+F4, hard shutdown, power cable, crash, normal exit?)

What seems to happen is that the "savedvariables" .lua file is either empty or partially written.

The .bak file does not help restore data.

Proposal for solution


I don't know how the process of serializing & writing to disk works exactly at low level, that said, if each .lua file was saved with an other name, then renamed after file is closed instead of writing directly to the .lua file, that would at worse roll back data instead of wiping it.

Duugu commented 2 years ago

Had a couple of users with that too. And at least in those cases the limit for the savedvariable lua file turned out to be at ~50-60 mb. Never had a user with smaller files experiencing this. No info on their system, tough. But the 50-60 mb limit was quite constant in all cases.

Just one addition: All of my users are blind. Thus almost all of them are exiting the game with ALT + F4. Mostly without even logging out to the char selection.

Meorawr commented 2 years ago

But the 50-60 mb limit was quite constant in all cases

The game doesn't have a hard limit on file size for saved variables; it'll have no problem loading scripts containing hundreds of megabytes of data if you generate such.

The error you're likely seeing is instead a VM limit being reached. In most cases this is a "constant table overflow" error which is triggered when you've got over 262,144 unique literal values (strings, numbers, booleans, and nil) within the saved variables file.

InfusOnWoW commented 2 years ago

So from WeakAuras perspective, it's clear that WoW does a poor job of ensuring the integrity of addon data. Whatever the problem is, we get multiple persons per month that have lost all their data. And we have passed on that feedback multiple times.

We usually don't even investigate what happened, we have a bot command to point them to the usually fruitless ways to recover their data. Also because it's so common, we added automatic backup functionality to the companion app we wrote.

That's how bad it is.

For example here is a .bak file that shows a clear problem in WoWs handling of addon data. WeakAuras.lua.bak.txt

It ends in roughly 950k NUL bytes. Clearly there's no way that our addon could have made that file.

So as far as we can tell the following happened. 1) WoW failed to save .lua file for whatever reasons and writes 950k NUL bytes 2) The user logins again, sees their data missing. 3) They log out again and WoW copies the totally broken .lua file to the .bak file, thus removing the backup.

So I have several suggestions on how to improve that, even if it isn't clear what happens in step 1. In Step 2, show a error message on encountering such a .lua file and automatically fall back to reading from the .bak file Or In Step 3, on detecting that there was a problem with the loading, rename the .bak file before overwriting it.

Meorawr commented 2 years ago

The only issue with automatically recovering from the .bak file is cases where saved variables fail to load due to constant table overflows -or- other parser related errors (for example, reading a serialized NaN value).

As the .bak file would effectively be a snapshot of the saved variables from the prior session, the contents of the file itself may be close to an error-like state already. This could lead to a bit of a loop where saved variables keep getting rolled back, for example:

  1. User logs in with "FooAuras" enabled.
  2. Game loads variables for FooAuras; these trigger a constant table overflow error. The game loads the .bak file instead.
  3. FooAuras writes some new data to its saved variables table that will - when loaded - push it back over the constant table limit.
  4. User logs out. Game saves updated saved variables.
  5. User logs back in.
  6. Game loads variables for FooAuras; these trigger another constant table overflow. The game loads the .bak file again.

This isn't a showstopper, but will still lead to confusing situations of data loss - and so I think you'll still get a few bug reports now and then about the core problem.

Personally, I'd like a new event to be added that'll notify addons of errors when loading their saved variables in addition to restoring the .bak. Hypothetically this could be called SAVED_VARIABLES_ERROR and it'd just need a payload consisting of addonName and errorString. With this event any addon can be made aware of when their SVs have failed to load, which would allow for corrective action (eg. dumping any persistent caches) or notifying the user in a more friendly way that doesn't require them to have bugsack installed.

That said - I think priority should be main issue of the ticket which is not writing SVs in a way that's prone to data loss to begin with.

InfusOnWoW commented 2 years ago

Today we had another person in discord, where we could ask for the .bak file.

WeakAuras.lua.bak-2.txt

That person also remarked: "if you can recover it, it's worth money to me; let me know how to donate".

InfusOnWoW commented 2 years ago

WeakAuras.lua.bak3.txt

And another case. Same issue as always: the file ends in 3MB of NUL bytes, while in the middle of writing a simple string:

            ["wagoID"] = "dcaJOvkxa",
            ["pa\0\0\0
InfusOnWoW commented 2 years ago

And another case, this time in Blizzard_Console

Blizzard_Console.lua.txt

The file is empty except for the 93.859 NUL bytes.

Meorawr commented 1 year ago

The reliability of the client and its writing of saved variables files should have improved as of 10.0.5. There's likely to still be occurrences where things go wrong, but they should hopefully be far less frequent now.

InfusOnWoW commented 1 year ago

Great. Is there any more information on what the changes are?

Stanzilla commented 1 year ago

Great. Is there any more information on what the changes are?

Additional precautions were added to ensure that savedvariables files were flushed to disk after being written. This was a source of some data being corrupted. The exceptions have gone down considerably since that change, but there are probably still outstanding files that need to be resaved before all the exceptions disappear (ideally).

Meorawr commented 1 year ago

As this one was mostly solved I'll close this ticket for now - would probably be a good idea to set up a separate one for suggestions on handling .bak files.

InfusOnWoW commented 1 year ago

We got another report of a corrupted file saving today: https://github.com/WeakAuras/WeakAuras2/issues/4695

Also the issue that the game on reading corrupt data overwrites the automatic backup still exists and still is a questionable choice.