AsherGlick / Burrito

An overlay tool for Guild Wars 2 that works on linux
GNU General Public License v2.0
79 stars 18 forks source link

[XML Markers] Output Protobins Using Internal Category and Map Hierarchy #75

Closed AsherGlick closed 1 year ago

AsherGlick commented 2 years ago

Once the Protobuf Classes have been populated we need a way to split them up and write them to the internal file structure. Each category will be split up by map id so that the user only needs to load markers for the particular map they are on. Additionally we want to allow for an easy construction of the hierarchy of what categories are in each map. Therefore the internal map hirearchy will look like this

<map_id>/
    base_category/
        sub_category_1/
            sub_category_1_1.bin
            sub_category_1_2.bin
        sub_category_2/
            sub_category_2_1.bin
        sub_category_2.bin

This will allow burrito to simply list all the files in a directory in order to later figure out what are all the categories in that directory. if we add the ability to show all categories when a user wants to see them we can then just create a single file that has a map of all the categories, instead of having Burrito parse every folder

coderedart commented 2 years ago

o/ long time no see.

I almost went with a similar approach at first, but this layout will create SO MANY FILES.

tekkit for example has over 8k categories https://mp-repo.blishhud.com/tw_ALL_IN_ONE.taco.html and some of those categories will be duplicated across maps, so its even more files within a single marker pack. and an equivalent number of directories.

at the same time, having your category menu tree spread out across map folders means that category menu can be different per different maps. also, there's no defined "order" of category menu items, so for example in a queensdale map

15/ 
    guild_missions/
    map_completion/
    achievements/

how does someone define the categories to appear in menu in that order with just (sub-)directories? editing a category menu will take forever trying to copy / paste dirs in every map folder too.

there's roughly less than a thousand maps https://api.guildwars2.com/v2/maps and most of them are never going to have markers. tekkit only has markers on 187 maps (refer above link), so 200 ish files for the largest pack doesn't sound bad.

AsherGlick commented 2 years ago

The internal binary format is not intended for use by humans. Users will only need to care about the "full" category list, and not individual map categories. However only categories that are present in the map you are in will show up by default in the category selection, as to cut down on menu size. With this design Burrito can just query the directory and subdirectories to find what categories are on the map because if a map does not have any markers for a given category no folder or file should exist.

Category ordering though is a very good point. Directories do not have a set ordering so that data could not be properly preserved under this design. @coderedart Do you have a suggustion as to a different method of storing the files internally?

coderedart commented 2 years ago

Even if its internally, creating thousands of files / directories for a single marker pack doesn't feel warranted (especially when there's no compelling reason to do so). burrito will be reading all of those thousands of directories from disk every time a user changes map, that cannot be good for performance. editing categories (moving / deleting / creating ) them is also a filesystem operation now.

If we store all the categories in a single file (tree structure), then it makes a few things easy. all maps use the same category menu and its easier to edit the categories like moving parent.child.subchild to parent.{child, subchild} and not having to change the category structure in ALL maps (like the present directory based idea).

The problem is referring to them from markers / trails.

  1. Refer to them via string like xml markers "tekkit.achievements.something".this means if the something is moved as a child to another node tekkit.mesmer.something, we will have to edit all markers in all maps too to reflect that change, or they will be referring to non-existent category (tekkit.achievements.something -> tekkit.mesmer.something).
  2. Give unique ids to the categories and make markers refer to them using the ids. if user edits and moves a category, it will still have the unique id, so markers are still referring to the same category. this works for burrito because nobody else messes with the raw bin files.

Now the category schema could just be simple recursive tree structure and each node has a unique ID.

With markers, burrito can just store them all in a single file per map. in 99% of the cases where a user is using a markerpack rather than editing it, burrito just loads the category.bin once at start up of the program, and only load the relevant map_id.bin when user loads into a map.

As for showing only categories that are relevant to the current map, just read the map_id.bin and collect all the category references (unique ids or "parent.child" style paths) into a current_map_category_set (of type unordered_set). then just create / refresh the category menu UI node at every new map load to only display the current map categories by filtering out the categories that are not in the set.

that's what i would do, but if you want to stick to the existing format of separate categories per map, just add a field called weight to the category.bin. and when creating the UI menu, use the weight to sort the categories at a certain level (and also have a rule that with same weight, you will use alphabetical sort to sort them). you already have props that belong to categories like default_toggle or is_separator or displayName, so this would be just one more property like them.

AsherGlick commented 2 years ago

I think I am in agreement about not using nested folders, and instead using some sort of metadata file containing all of the category hierarchical data.

The thing I would be concerned with would be the loading times of trying to load all the markers on a particular map each time you entered it. Right now I load all the markers a category has, but only one category at a time, and it takes a few seconds for all that to be read from disk and parsed. With the markers per category already loaded, it only takes a few frames to build all the marker geometry for a newly entered map, and I kinda want to keep that as low as possible. But this is really looking like an issue of "good enough". No need to prematurely optimize.

Some experimentation is clearly needed. I think the best way forward is to start with the simplest unoptimized solution first and then optimize and seen fit.

1) All of the marker data from all categories and maps will be in a single protobin file. This will be very easy to do, and will mean that the data is all loaded on startup time and therefore no loads are needed between maps. 2) If that is too slow or uses too much memory then the markers will be split up into individual map files. Similar to the first iteration but now isolated per map. This will shift the load time off of startup and onto each map load instead, increasing startup time. 3) If that is too slow or uses too much memory, then the markers will be split up further by category and each map id will have a categories.bin file which will hold all of the hierarchical marker category metadata, and will map to a file in an adjacent markers/ folder that contains a series of flat files such as 5ytAf4AtbJv7SGCC.bin, named by either psuedorandom numbers or a hash of the full marker category stringy representation. These files will be identifiable from the categories.bin file. 4) If that is too many inodes per folder then a middle layer of folders containing the first 1-3 characters of the filename to split up the files will be added. 5) If overall inode quantity per byte is too high then 4 may be reverted, and instead files will be packed together into the same packaged file with a simple preamble that lists where in the file the original marker data can be found, so the reader program can seek to that location and attempt to only read the number of bits required. 6) If the cosmic quantum deities deem this level of file compartmentalization is too blasphemous we will print out all of the marker data on QR codes and have the users scan the QR codes in via either their webcam or phone to load the data.

Jokes aside, I think this is a good battle plan to achieve the correct balance between speed, files, and development time.

coderedart commented 2 years ago

everything in a single file is definitely the direction i thought you were going with until i saw this issue :).

the main disadvantage would be that every time the user saves the marker pack while editing, i will need to serialize / save the whole pack to the disk which is a lot of work (cpu usage / disk writes). but still much much better than the thousands of directories method.