New persistence strategy for better cross-version compatibility

Elesbaan70 commented 2 years ago

Note - This issue originally called for the use of Json, which was determined be unfeasible due to the lack of a good full-featured Json serializer that is compatible with Unity.

Describe your idea

In TMPE's current persistence strategy, the introduction of a new type into the persistent data will break backward compatibility. Although old versions are not supported, players reverting from alpha to stable should not lose savegame data needlessly.

The same issue also affects forward compatibility in ways that restrict future development choices. Ideally, the save data would not be bound to specific types. A move away from strongly typed serialization would give everyone more flexibility moving forward.

Requirements

This is an opt-in change. All existing serialization will remain untouched until there is a reason to do it the new way, as determined feature by feature.
A manager-based persistence mechanism in which SerializableDataExtension enumerates a collection of managers and calls LoadData() and SaveData() methods basically like the ones we have today.
SerializableDataExtension will save the data from each manager separately, to isolate them and minimize the risks associated with changes to persistent data.
A manager will have a means of identifying other managers upon which its data depends. This will determine the enumeration order during load and save.
Linq to XML will be used for persistence.

TrafficLightSimulationManager will be the first converted to the new persistence strategy, since displaced lane support was the breaking change that prompted this issue. Its conversion to the new persistence strategy will be delivered separately from and prior to displaced lane support.

Tasks

[x] Persistence framework
[x] Add a DOM to the serialized data and make a mechanism for opt-in transition away from the old serialization strategy (SerializableDataExtension)
[x] Update timed traffic lights to be saved into the DOM (half-baked persistable model)
[x] Update junction restrictions to be saved into the DOM (fully persistable model)
[x] Compress XML data, with fallback to uncompressed if it fails (especially on some Linux distros)
[x] Test versioning in a relevant feature branch such as https://github.com/Elesbaan70/TMPE/tree/lane-grouping
[x] Final code cleanup
[x] Wiki articles for future development
[x] Mark #1559 ready for review

kianzarrin commented 2 years ago

is json better than xml?

kianzarrin commented 2 years ago

binary serialization is possible in theory but is clunky in practice specially with this old version of mono.

Elesbaan70 commented 2 years ago

is json better than xml?

Json's simplicity tends to mean less setup and tweaking to get the results you need, which I think makes it better for small applications like TMPE. For large and complex data models I tend to lean toward XML, but in smaller applications like this, XML done right could end up being ~one of the most complex parts of the code~ - perhaps an overstatement, but it could take on complexity that seems out of proportion to the amount of data we're working with.

I also prefer XML for files intended to be manually edited. Json can be manually edited, but in my opinion it is a slightly less friendly format for that.

(If you're wondering, I think XML is a better fit for AN, both because your long-term plans involve much more complex data, and because it's intended to be manually editable.)

binary serialization is possible in theory but is clunky in practice specially with this old version of mono.

There's actually a binary version of Json, unsurprisingly called Bson, and the version of mono is a non-issue as far as that goes. But text serialization is easier to troubleshoot, and I don't think we're talking about enough data for file size to be a concern.

Elesbaan70 commented 2 years ago

is json better than xml?

Also, System.Xml.Serialization has a lot of weird limitations, like you can't use it to serialize dictionaries. So we'd have to either live with the limitations, or else find a third-party XML serializer. Newtonsoft.Json is technically third-party, but it's a very pervasive standard, especially on older .NET/Mono versions.

kianzarrin commented 2 years ago

Json sounds great. We already use XML in TMPE to store global config. i hope using several different standards does not cause confusion.

kianzarrin commented 2 years ago

in AN ppl manually modify pops so I think XML is good

krzychu124 commented 2 years ago

XML allows saving more complex data structures which will be way more readable for the user if he wants to edit something. The main problem with the current implementation. I'll remind that we are talking about serialization of all TM:PE data in the savegame not just simple configs. The problem now is we store everything into one single binary which cannot be deserialized if you don't have fully compatible classes. Just like @Elesbaan70 mentioned, literally any change in the data model makes it unloadable when you try to use previous version of the mod. Example: when you add boolean field to one of serialized classes (you don't need to even use it anywhere) the new data after serialization will be 100% incompatible with previous version of the mod because of how binary serialization work - chunk of bytes, requires a class which work as a form of template describing how data should be read.

Having a JSON or even XML will make it at least editable -> you can dump the binary data into (JSON/XML) string and modify it at worst case or prepare small migration function which could automatically fix/transform things in case we make a mistake somewhere. With binary you can't even see what's inside. Also one bonus of JSON/XML or any string based serialization -> you can easily write data migration code and most importantly unit/integration tests for that (some time ago I tried loading TMPE binary data and deserialize in unit test - impossible).

We are limited with size of binary array which can be saved by SerializableDataExtension (~16MB), so splitting data by managers will solve that problem.

Anyways I asked @Elesbaan70 about other ways for serialization since I also want to introduce changes to TMPE savegame data. I need to store "airplane size restrictions" somehow, so just an addition to what we have without changing anything else. When I push that change to my draft PR it will break compatibility because non of previous version will be able to load data once saved with new version. Since it would be additional thing I don't see why it couldn't be completely skipped when loading while running previous version of the mod (data itself has no use if there is no code which could interpret it and do something later).

I hope migration out of binary serialization will solve that problem at least partially. Obviously the current (de)serialization code needs to stay where it is to allow loading data from older savegames.

Elesbaan70 commented 2 years ago

We already use XML in TMPE to store global config.

Completely different because it's a standalone file and theoretically editable. Although in practice it's not a good implementation for manual editing--you can't even tell which switches are which, except by counting from the top.

XML should be considered for any future standalone TMPE files.

Elesbaan70 commented 2 years ago

the new data after serialization will be 100% incompatible with previous version of the mod because of how binary serialization work - chunk of bytes, requires a class which work as a form of template describing how data should be read.

It's worth noting, in case it's relevant to the choices we're making, that binary serialization doesn't have to be this way. Bson doesn't behave this way. It's a question of how the binary serialization is accomplished.

System.Runtime.Serialization's basic serialization is strongly typed and wants to deserialize and figure out what to do with every element, and so any change breaks backward compatibility. This can probably be overcome with custom serialization, but there's no good reason to make it so complicated when alternatives are available.

Elesbaan70 commented 2 years ago

@aubergine10 The "Settings" label is incorrect for this. It should be labeled "serialization" and "lifecycle". I would fix it, but I don't have any permissions here. :-)

krzychu124 commented 2 years ago

It's worth noting, in case it's relevant to the choices we're making, that binary serialization doesn't have to be this way. Bson doesn't behave this way. It's a question of how the binary serialization is accomplished.

Yes I know, but I meant BInaryFormatter we currently use for (de)serialization. I think it won't make much of a difference if we convert JSON string as binary array or do similar with Bson. Whatever we select it need to end up as binary array.

kianzarrin commented 2 years ago

you can't even tell which switches are which, except by counting from the top.

tell me about it! I do that all the time! I was thinking about overriding the Serializer but I think its easier to just add something in the maintenance tab.

Elesbaan70 commented 2 years ago

I think it won't make much of a difference if we convert JSON string as binary array or do similar with Bson. Whatever we select it need to end up as binary array.

As I'm currently implementing it (a bit different from the demonstration code I showed you before), manager data is saved in a Dictionary<string, string> where the key is the full name of the saved data type (not of the manager type), and the value is the Json-serialized data. (The key is not strictly type-bound, it's only a naming convention.) I chose text serialization for easier troubleshooting.

The dictionary itself is also serialized as Json, which is converted to a byte array using StreamWriter. It could be serialized as Bson instead, but there wouldn't be much savings since the data is text anyway, and staying with Json saves us from having to bring in a second external dependency for the Bson library.

Elesbaan70 commented 2 years ago

After all this discussion, it turns out that we will have to go with XML anyway. The Unity port of Newtonsoft.Json doesn't really work, and it hasn't been maintained in several years. *sigh*

kianzarrin commented 2 years ago

Lets use a port of XML that is less problematic. I suspect this old version of mono does not have very good XML. I see macsurgey's mods have provided their own System.Xml.Linq.dll

Elesbaan70 commented 2 years ago

Linq to XML is a DOM, not a serializer. I haven't look at it in a while, so I'll take a fresh look, but working with a DOM basically means you're assembling and interpreting the XML elements manually, so it ends up being a lot of work.

Maybe I'm just being naïve, but I'm not very worried about Mono's System.Xml.Serialization. It's just that it's a pain to work with, but still probably better than doing it all ourselves.

kianzarrin commented 2 years ago

It's just that it's a pain to work with

If I recall properly lists are a pain to work with.

but still probably better than doing it all ourselves.

doing it ourselves? why not to just use 3rd party library?

Elesbaan70 commented 2 years ago

If I recall properly lists are a pain to work with.

I guess I don't have context for that, because I'm not sure what you're referring to.

doing it ourselves? why not to just use 3rd party library?

I meant "doing it ourselves" in the sense of processing all the elements manually, like MacSergey is apparently doing since he uses Linq to XML.

If you know of third-party serialization that will work under Unity and has a solid track record and active development, I'm all for it. But the Unity port of Newtonsoft.Json being nonfunctional and abandoned is an example of where this can eventually lead if we're not careful.

kianzarrin commented 2 years ago

I don't know I have to do some tests. i think I had to convert list to array for it to work.

kianzarrin commented 2 years ago

I was struggling with xml here: https://github.com/Quboid/CS-MoveIt/blob/da6d67d770ff013162f931f01b3e88949ee7b3cc/MoveIt/Moveable/Instance.cs#L120

I don't know if there was an issue with dictionary or list or both but I had to use array.

Elesbaan70 commented 2 years ago

Yes, that's one of the annoying limitations I mentioned in System.Xml.Serialization is that dictionaries are not supported. Another is that it's hard to make enums backward-compatible.

After sleeping on it, I'm starting to warm up to the idea of using Linq to XML like MacSergey appears to be doing (I'm assuming this based on the presence of System.Xml.Linq.dll.) I'm going to try going down that path for a little bit, and see how it plays out. (Since TMPE already converts to/from separate objects for the serializable model, and I was continuing that practice, I'm not sure that processing the elements directly is really that much more work.)

Elesbaan70 commented 2 years ago

If anyone wants to follow the progress in code, it is here: https://github.com/Elesbaan70/TMPE/tree/xml-persistence

Elesbaan70 commented 2 years ago

As I've mentioned before (maybe elsewhere, since I don't see it here), the ideal persistable architecture involves having a set of semi-stable data objects that are maintained by the application, which can be serialized, then later deserialized and submitted to the application. These data objects then act as a kind of persistence API, integral to the implementation yet having a discreet definition that gives them some independence from the program logic.

Currently, TMPE builds an entirely separate object model for persistence, using a combination of public and private APIs to convert to and from this object model. This might be a good approach if the persistent model were independently defined, but as things stand now, it puts too much work on the persistence logic.

There's no clear path out of this, since moving to distinctly defined data objects at the implementation level would be a huge amount of work. But for now, I'm attacking this with a hybrid approach. Here's how it works:

I define a "model" of sorts with interfaces, which are in turn implemented internally. See https://github.com/Elesbaan70/TMPE/tree/xml-persistence/TLM/TLM/TrafficLight/Model

This model accomplishes two things:

It gives the persistence code a clearly defined reference point for understanding what data it needs to save and how it should name things.
It gives the program logic a way to clearly and consistently define persistable data.

What I will not attempt to do at this time is to make this object model work both ways. Loading will work as it always has, except that the new object model interfaces will be used as a reference point for knowing what the saved data looks like.

Elesbaan70 commented 2 years ago

I am removing the compression task since this apparently causes issues on some Linux distros. But we really should come back to it later, because the size of the DOM will balloon on bigger cities. We'll need to pay attention to file size when testing this on larger cities, to make sure the hit is acceptable.

And I am adding a task to test the versioning in the displaced lanes project prior to release. We wouldn't want to deploy this and then find out that it doesn't work as advertised!

originalfoo commented 2 years ago

We will certainly get issues because even with the existing system some users have so much traffic config that it's too big. XML will vastly exacerbate that because now we'll be using, I dunno, 30 bytes where previously we'd use 1? We need the compression.

If it's a problem for Linux, then we should disable compression on Linux only rather than forcing the majority of users to live without it for no reason.

Elesbaan70 commented 2 years ago

A possible (ugly) solution: Attempt to save with compression, and if there is an error, fall back to no compression before finally giving up.

originalfoo commented 2 years ago

Yup, that would be OK imo, so long as it doesn't make code utterly heinous.

What's the issue with zipping stuff on linux btw? Is it just some distros don't include required libs or have non-standard libs?

Elesbaan70 commented 2 years ago

I'm not sure. @krzychu124 could probably provide more details. He referenced MacSergey/NodeMarkup#96, which I skimmed but didn't study closely enough to really understand what was going wrong.

originalfoo commented 2 years ago

This comment has workaround: https://github.com/MacSergey/NodeMarkup/issues/96#issuecomment-670970107

EDIT: If zipping fails, fallback to uncompressed but log error in game & tmpe logs stating that mono is required?

Elesbaan70 commented 2 years ago

Yup, but do we want to require a manual install of Mono? I think Linux users would generally be more comfortable with this kind of requirement than on other platforms, but it still seems less than ideal.

kianzarrin commented 2 years ago

whats the point of using xml if we are going to compress it? its no longer human readable.

Elesbaan70 commented 2 years ago

The primary purposes of this project are a persistence strategy less prone to breaking when enhanced, and smoother versioning when a breaking change is necessary. To the extent that there is human readability, it is an added bonus.
It is human-readable in logging
The compression can be disabled for debugging
Human readability in the final savegame is only theoretical anyway, since it's a binary format. I mean, you can see the XML in there, but it gets weird reading it that way.
XML and other text-based formats are easier to develop and maintain because they represent data as a hierarchy of key-value pairs.* It's more than just human readability.

* XML does more than that, but that remains the essence of what XML serialization is doing.

kianzarrin commented 2 years ago

@Elesbaan70 we don't need to compress the global config though.

Elesbaan70 commented 2 years ago

I haven't even touched the global config yet.

Elesbaan70 commented 2 years ago

I mean, we're talking about global config, so it's not part of this anyway, right?

kianzarrin commented 2 years ago

yeah

krzychu124 commented 2 years ago

@Elesbaan70 we don't need to compress the global config though.

I think you didn't notice why compression might be required. The reason is simple. We still save data as byte[] in the savegame and there is a limit (~16MB IIRC) for a value. Global config is stored on disk and has known size, there is no dynamic data which can vary between versions unlike TM:PE data where almost all data is stored as collections (mostly limited to things we changed skipping default values to save memory)

Elesbaan70 commented 2 years ago

JunctionRestrictionsManager has been updated to use the new persistence framework. @krzychu124, I think you will find this to be a much simpler and more straightforward case than traffic lights! 😅

https://github.com/Elesbaan70/TMPE/blob/xml-persistence/TLM/TLM/Manager/Impl/JunctionRestrictionsManager.Persistence.cs

It uses a super-lightweight serializer I built on top of Linq to XML, that lets you serialize stable types but doesn't assume you've built a serializable model of your entire dataset. It's not intended to do that kind of heavy lifting, so it's stupid-simple and fits well in how Linq to XML does things.

Elesbaan70 commented 2 years ago

Keeping the compression is definitely the right move. Looks like for 2 timed traffic lights and junction restrictions on 4 nodes, compression saves us about 8k. That's enough to make one's head spin.

krzychu124 commented 2 years ago

@Elesbaan70 in case you don't know, the problem with compression on Linux won't just fail or throw an exception... It will crash the game if you try to use it and there is no mono (required for some distros to work properly). 😬

With regards to size, I think we will/could follow the idea of splitting "managers" into separate containers, so theoretically the limit of 16MB would be per manager, not whole data set.

Anyways code for Junction Restriction is much more readable and understandable for me, compared to ugly from the Traffic Lights 😅

Elesbaan70 commented 2 years ago

compression on Linux won't just fail or throw an exception... It will crash the game if you try to use it and there is no mono (required for some distros to work properly). 😬

Yikes. So....

How do I detect whether mono is installed?
How do I detect whether it is a distro for which mono is required?

I think we will/could follow the idea of splitting "managers" into separate containers

Okay, that's an easy enough change.

Anyways code for Junction Restriction is much more readable and understandable for me, compared to ugly from the Traffic Lights 😅

Yes! And yet, I suspect you will find it easiest to use the half-baked model approach I used in TTL. A full model like I created in the junction restrictions might require too much change to the manager's implementation. I chose Junction Restrictions for the full model example because I identified it as the easiest place to make a change of that kind.

When I say "half-baked model," I mean that TTL implements a set of interfaces for its model, and the save code writes XML from those interfaces. But from the TTL implementation's point of view, that is a one-way trip. You can't hand a model to the TTL to set up traffic lights; the load code has to convert the incoming XML data into method calls that TTL understands for setting up the traffic lights.

Elesbaan70 commented 2 years ago

I think we will/could follow the idea of splitting "managers" into separate containers

This change is now complete.

kianzarrin commented 2 years ago

@Elesbaan70 what does that mean? did you move lane arrow flags inside lane arrow manager?

kianzarrin commented 2 years ago

to detect if you are in linux you can check UnityEngine.Application.platform == LinuxPlayer

Elesbaan70 commented 2 years ago

@Elesbaan70 what does that mean? did you move lane arrow flags inside lane arrow manager?

Huh? This issue is about how we store things in a savegame.

to detect if you are in linux you can check UnityEngine.Application.platform == LinuxPlayer

Cool, now how do I know if mono is installed? Or better yet, how do I detect the condition that causes the game crash?

I will not knowingly merge code that causes someone's game to crash. A workaround that can be implemented in code is absolutely necessary.

kianzarrin commented 2 years ago

@Elesbaan70

I think we will/could follow the idea of splitting "managers" into separate containers This change is now complete.

what change is complete? what does it mean to separate containers?

kianzarrin commented 2 years ago

Cool, now how do I know if mono is installed? Or better yet, how do I detect the condition that causes the game crash?

you can look at original harmonys code. it has checks for mono version and stuff. which version of mono crashes? which version of mono works? are we talking bout dnSpy mono?

Elesbaan70 commented 2 years ago

@Elesbaan70

I think we will/could follow the idea of splitting "managers" into separate containers This change is now complete.

what change is complete? what does it mean to separate containers?

The change history is on the draft PR associated with this case.

The code for this issue isolates the logic for each manager in such a way that failing to load one manager's data does not prevent the others from loading. (That's one of the problems with the current code.) From my point of view, this was just a logic problem, but @krzychu124 understood it to mean that I was actually putting each manager's data in a separate container under SerializableData. But that was not initially the case. They were just separate elements in a single XML document.

But then @krzychu124 pointed out that it may be useful to use separate containers, since the container has a size limit. So now I've changed it to work the way he expected.

Elesbaan70 commented 2 years ago

Cool, now how do I know if mono is installed? Or better yet, how do I detect the condition that causes the game crash?

you can look at original harmonys code. it has checks for mono version and stuff. which version of mono crashes? which version of mono works? are we talking bout dnSpy mono?

On some Linux distros, the Mono bundled with the game does a hard crash if you try to use compression. Installing Mono independently of the game resolves this.

krzychu124 commented 2 years ago

Exactly, more info about crashing mono and solution you can find here: https://github.com/MacSergey/NodeMarkup/issues/96 (already mentioned in this thread) Honestly I have no idea how to detect if it's installed or not. It's not C# library but external one (let's name it: "native").

Since we have limited size to be stored, @Elesbaan70 have you measure worst case scenario, how much data we really need to store - assuming we use container per manager? We usually use arrays of known size so we could calculate if generated xml will fit without compression or not

CitiesSkylinesMods / TMPE