AsherGlick / Burrito

An overlay tool for Guild Wars 2 that works on linux
GNU General Public License v2.0
76 stars 18 forks source link

Binary Burrito Files #11

Open AsherGlick opened 2 years ago

AsherGlick commented 2 years ago

Burrito files are currently stored in .json format. Most of the data in them is either structural data {}``[] or floating point numbers. These are traditionally poorly encoded in the JSON format.

Using protocol buffers is a good compromise between a strict custom packed binary file and JSON formatting.

Another option instead of protocol buffers is to use Godot's binary serialization. The upside to this over protocol buffers is that it is built into the godot codebase and additional protobuf code will not need to be created. A downside to this is that it is not schema based meaning that it would be harder to handle future proofing the file. Using the built-in serialize would also make it more difficult for other programs to interact with it. This may still be an option for a short-term solution.

coderedart commented 2 years ago

I definitely think markers format should move to binary format as with burrito, it's not necessary to be manually editing XML files anymore. although, having windows support before this is necessary as most people use windows exclusively.

  1. Would be more useful to decide with other overlays developers.
  2. Must be a format with multiple languages support like protobuffers or flat buffers or even flex buffers.

Although we will still need to provide a XML exporter or converter if we want people to use burrito to create markers. People definitely want compatibility with taco if they can get it

AsherGlick commented 2 years ago

Windows support is tracked in #9.

dlamkins commented 2 years ago

Although we will still need to provide a XML exporter or converter if we want people to use burrito to create markers. People definitely want compatibility with taco if they can get it

This is an incredibly important point. Please sure to work with other overlays to help maintain compatibility in one way or another. We want to continue to give marker pack makers powerful tools and features without splintering their userbases.

Blish HUD can handle alternative format readers (our reader is handled by a library called TmfLib we've written to keep the spec mostly straightforward), but we have had no interest in introducing other formats at the moment because we've not yet found any reasonable benefit to it.

We (Blish HUD) keeps an attribute implementation list if you wish you match the implementations of TacO & Blish HUD and we can note your compatibility there.

AsherGlick commented 2 years ago

Interoperability is definitely a good thing. Right now all there is is this script to convert from XML to the Burrito format, I plan to update it when a binary file format is implemented. A similar converter could be created to convert back.

However, the set of data stored in these formats would possibly be disjoint depending on the outcome of these issues:

2 is a replacement of the behavior, autotrigger, and resetLength attributes.

3 might end up looking the same as the XML format, but could result differently. Any insights from how Blish HUD is used would be very much welcome.

10 is a completely new type intended to assist #2, it can lossily represented as a Blish HUD trail but converting back would be difficult.

Skimming through the other attributes they seem like they can all eventually be implemented.

dlamkins commented 2 years ago

For your second item, Blish HUD represents categories as a nested set of menus, much like TacO does. We've taken great care to keep this list clutter free, though, so users can focus on the items relevant to them on that map - much unlike how TacO handles this.

I would definitely be interested in discussing the third item more.

Certainly we could handle that easily with a new pathing type in the XML, or, more simply, just as a trail with an additional attribute which indicates that it's an area and not just a trail (which would then close of the area by connecting the first and last point).

We could introduce an "area" element as well, but that would remove backwards compatibility and at least with the above TacO would still show something for it.

coderedart commented 2 years ago

I know i'm going against my past self here, but binary formats are not really as good as i expected them to be. At this point, every overlay is using its own format anyway (except for taco/blish), so it doesn't really matter that much if Burrito chooses to use a binary format. It also kind of depends on how deep you wanna go with binary format. just for xml files or include png/.trl bytes inside the binary file. zero-copy deserialization or full deserialization and other considerations.

but I thought i should atleast point out some merits/demerits of binary format:

Merits

  1. forces marker format to be more lean/defined like sized number types for attributes. bools can be just a byte of bitflags to save space. remove heap allocations based attributes (unsized types like string/vectors) eg: type, iconFile, trailData, info, Tip, copy, etc.. and force them to be referred by sized ids to index into global resource tables or whatever. ( I do this with Jokolay's Json format anyway, but binary formats sort of force you into this paradigm naturally. Sized types are always faster)
  2. very tiny memory footprint (and with flatbuffers zerocopy deserialization) even slower/bloated languages like python/js can use them with just a single mmap function.
  3. very low wire transfer size (especially as a capnproto/protobuf/zeromq/bincode have some sort of RPC framework attached) leading to live editing/display of markers between different players and other useful developments

Demerits:

  1. much more black box-ish . much harder to look into a binary format file and know if there's any issue with it.
  2. lack of data versioning like git. binary files are harder to diff visually compared to text formats.
  3. much harder to split into multiple files naturally. lack of tooling.
  4. much harder to select which one to actually choose. the bindings are very inconsistent across languages for differenet formats.

My main consideration with regards to marker packs is generally how git friendly are they. for diffing, compression and size.

diffing => text formats win here. for pack maintainers, someone should be able to look at a PR and say "that looks okay to be merged". maybe this can be solved by github bots to show a proper diff?

compression => in my testing with json and a rust specific binary format called bincode, uncompressed json was huge, but when you compress them by putting them in a git repo or zip file or just raw compression like zstd, both were very very close. like within 3-4% of each. text compression is much easier than binary compression. any compression needs to happen within the binary format itself (like using varint in protobuf).

size => as i said above, only matters uncompressed. you can go test the popular packs like tekkit and it turns out with a pack of 45MB, data folder (png + trl files+ take up 35 MB and all xml files take up less than 10 MB. so, whatever space we save from binary formats (lets say 50% space savings) will be only like 10% of a pack (5 MB).

At the end, I was considering moving to a database schema to be better than using binary formats. as a Database will give a lot more mature tooling and live marker packs can just be a database connection to a server where the pack maker could be editing them live. and ofc, any respectable language will have a database driver.

Disclaimer:

Btw I am purely putting this info here for the considerations of Burrito as majority of what i said above is from https://gitlab.com/coderedart/jokolay/-/issues/19 where i was designing a marker format for Jokolay (half of my contributions on this repo are more or less sharing Jokolay's experiences, primarily with x11).

This is not to discuss a common binary marker format between all overlays. I do not feel like they are really worth it over a textual format. do not know about others, but i have no intention of bothering with a new marker format, I will just import xml packs to convert them into json packs and vice versa to keep compatibility with other Overlays.

AsherGlick commented 2 years ago

Burrito needs some sort of internal file type because I am going to split markers into their respective maps and categories so that only markers that will be visible need to be loaded at any one time. The defacto XML standard could serve that purpose, but any other format could just as easily as well. The primary reason for using a proto3 instead of XML or JSON is strict structure and type requirements that are inseparable from the format. Any additional benefits that come with using a binary format are secondary in consideration. I don't expect the XML interchange format to disappear anytime soon.