Slimefun / Slimefun4

Slimefun 4 - A unique Spigot/Paper plugin that looks and feels like a modpack. We've been giving you backpacks, jetpacks, reactors and much more since 2013.
GNU General Public License v3.0
966 stars 545 forks source link

[WIP] Binary storage for player data #4126

Open WalshyDev opened 8 months ago

WalshyDev commented 8 months ago

PR Checklist


What changed

Storage choice

Ok so, as you can see from the PR, the storage is done with NBT. The question I'm sure you're asking is why NBT over xyz? Well, there's a few reasons;

  1. Familiarity a. Lots of server owners and developers are familiar with NBT. This makes adoption internally and externally much easier (especially with how much MC has leaned into it over the years)
  2. Solid storage a. NBT is about as small as it can get. Some people think it's inflated, the format isn't. Usage however is. That isn't something any format can really prevent.
  3. Existing tooling a. Lots of existing tooling exists for server owners to read these files (a bit more on this later as well)

[!NOTE] This is a simple implementation which would have to have the whole file rewritten each time. I'll explore ways to avoid this but without limiting file size, there isn't a great way to do it.

Usage example

(backpack contains 3 Slimefun items; Energized Energy Capacitor, Energized Solar Generator and a Multi Tool VII)

Legacy file with all researches

-rw-r--r--   1 walshy  staff  3511 11 Feb 23:41 52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

Total: 3.5 KB

Legacy file with all researches and a waypoint

player file;

-rw-r--r--  1 walshy  staff  3511 12 Feb 00:42 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

3.5 KB

waypoint file:

-rw-r--r--  1 walshy  staff  110 12 Feb 00:42 data-storage/Slimefun/waypoints/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

110 bytes

Total: 3.6 KB

Legacy file with all researches, a backpack and a waypoint

player file:

-rw-r--r--  1 walshy  staff  8526 12 Feb 00:00 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

8.5 KB

waypoint player file:

-rw-r--r--  1 walshy  staff  110 12 Feb 00:00 data-storage/Slimefun/waypoints/52e0fc97-c914-4def-8899-8dbdf0c486c0.yml

110 bytes

Total: 8.6 KB

Binary file with all researches (no compression)

-rw-r--r--  1 walshy  staff  6875 12 Feb 00:08 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.dat

Total: 6.8 KB

(this is so big because of using namespaces instead of int IDs, these take up way more space per research)

Binary file with all researches (gzip compression)

-rw-r--r--  1 walshy  staff  2308 12 Feb 00:06 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.dat

Total: 2.3 KB

Binary file with all researches and a waypoint (gzip compression)

-rw-r--r--  1 walshy  staff  2457 12 Feb 00:37 data-storage/Slimefun/Players/52e0fc97-c914-4def-8899-8dbdf0c486c0.dat

Total: 2.4 KB

1,164 bytes smaller than legacy, 32.1% reduction in size (reminder that most of this space will be taken by the namespace researches -- with int IDs this would be much smaller)

Tooling

I'm sure we've all used NBT programs in the past to view an NBT file, NBTExplorer is the biggest one, there's also NBT Studio. The goal is for these programs (and others) to be able to view the files. I haven't tested this yet but I think this would be ideal.

However, these only support the compression algorithms that MC support (gzip & zlib). I'd love to use lz4 or zstd. Luckily, MC recently announced lz4 in snapshot 24w04a which will be coming in 1.20.5. This hopefully means one or both of these tools update to support it. That still doesn't mean they'd support zstd though so maybe we go lz4?

So as we stand today, we can't really go better than gzip/zlib but hopefully soon we can do lz4. Is this ok? How much should we care about existing tooling?

My goal is to avoid building our own šŸ˜…

For the reviewers

Open questions

[!IMPORTANT] As we stand today (12th Feb), looking for a "yay" or "nay" on the PR approach and answers to the above questions rather than a "this code and data structure is perfect".

Testing

TODO

After this PR

Census

Yay - 3 @J3fftw1 @ProfElements @Sfiguz7

Nay - 0

github-actions[bot] commented 8 months ago

Your Pull Request was automatically labelled as: "šŸŽˆ Feature" Thank you for contributing to this project! ā¤ļø

J3fftw1 commented 8 months ago
  • Are we good with NBT? Any other format people really want?
  • If another format, why? What advantage does it give?

I think NBT is the right way, we already discussed loads about this and about other storage options. but the gist of our discussion was NBT is good since most of the MC community is known with this. Even non admins/devs know about NBT this is widely used even by players using singeplayer.

  • Do we want to save player data elsewhere?
  • We've been saving to data-storage/Slimefun/Players and data-storage/Slimefun/waypoints for a long time. If we want to change that, this is the time
  • One change at least I'd like to make is moving waypoints into the player files. I need to figure out exactly how the shared waypoints work though before I do this, they seem to not do anything different but then confused how they're shared.

My vote would be to move this to ~/plugins/Slimefun/data-storage. We dont get reports of this often but we do occasionally that people got rid of the data by moving servers. Saving in ~/plugins/Slimefun/data-storage would make more sense to me.

My questions

How do we get people to use this, we couldn't answer this in our DMs either. It's an opt-in option people barely read configs anyways in this space so I doubt people will opt-in. I would love to see stats on this as well how many opt-in how many opt-out. Do we maybe want to throw it straight in dev its development anyways but other point is we also tell servers to run dev and advertise RC for plugin use only. soo /shrug

Is there a big difference between gzip, zlib and lz4. Can we move later on to lz4 if not lets just use what ever is supported now. on the other hand people wont move to different standards if no one supports it. Same discussion we had on to use postgres even though no one supports it. Should we care about existing tooling, yes but no idc much. People need to adapt anyways but idk i will leave that to the others.

Note:

We should run these tests again on a bigger scale bigger server more player data.

Edit: My answer would be yay for this btw.

WalshyDev commented 8 months ago

How do we get people to use this, we couldn't answer this in our DMs either. It's an opt-in option people barely read configs anyways in this space so I doubt people will opt-in. I would love to see stats on this as well how many opt-in how many opt-out. Do we maybe want to throw it straight in dev its development anyways but other point is we also tell servers to run dev and advertise RC for plugin use only. soo /shrug

The age old question of how to make people use the thing. Not sure there's really an easy way sadly. We can enable by default for new servers but are we likely to find bugs... or just push people away? New users aren't usually the ones to report stuff. However, this is how we rolled out item IDs back in the day. Compat mode was disabled by default for new servers and it slowly moved over to be everyone.

We'll definitely log to bStats too... though I can't make a new panel so gonna have to throw it somewhere else

Is there a big difference between gzip, zlib and lz4. Can we move later on to lz4 if not lets just use what ever is supported now.

We can do whatever we want :p As for difference, mostly comes down to compression ratio and speed. Lz4 doesn't compress as good as others but it is very fast. Zstd has the best compression ratio but it's slower. I'd say these days we value speed over ratio. Disk space is much easier to get than time.

We should run these tests again on a bigger scale bigger server more player data.

Agreed, I need the biggest player file we can find and a bunch of blockstorage later too.

ProfElements commented 8 months ago

I think NBT is a good format.

I think that binary storage in general is a good idea.

I think moving away from data-storage might be a misstep if taken due to data-storage being easily accessible instead of a multi-folder dive into plugins or something similar.

Moving way-points into players is probably going to be a big win unless it constantly duplicates data.

I think we should support any compression format Minecraft supports. Zlib/Gzip is a fine alternative until its better for us to break it for a faster or more efficient compression like lz4

Edit: This is a yay for me aswell. I will probably move to binary as soon as its available on dev

Sfiguz7 commented 8 months ago

I agree with most of what has been said above.

NBT sounds like a good format to use due to the accessibility - researches taking a bit more space is honestly not a problem.

As long as we compress I don't mind which compression method is used, it's going to be quite the jump compared to what we have had for years no matter what, and if anything speed is probably something that is more important when we have tens of thousands of blocks to take care of, so the fact it is already available just makes it even more welcome of a choice.

I agree with Jeff that moving the data folder inside Slimefun's folder would make sense, no other plugin is ever going to use that folder ever and as a user I had always found it weird how it is not a subfolder: /plugins/Slimefun is the first place I would check when looking for Slimefun related anything.

Overall, yay for me

Seggan commented 6 months ago

Has the NBT code for dough been PR'd yet? I don't see it so I assume not.