Snowiiii / Pumpkin

Empowering everyone to host fast and efficient Minecraft servers.
MIT License
591 stars 25 forks source link

[Discussion] World persistence over Database #92

Open Asurar0 opened 5 days ago

Asurar0 commented 5 days ago

Introduction

Minecraft employs a dedicated file format, known as the Anvil file format, to store chunk regions on disk for subsequent retrieval. This file storage approach is consistently utilized across all Minecraft Java Edition server implementations.

This issue aims to discuss the advantages of utilizing a B+Tree database (notably LMDB) over a file I/O approach for storing the world, as well as provide tools for converting between these storage formats, specifically exporting .MCA files from a database and importing .MCA files into a database.

Assumptions

The following assumptions underlie the reasoning of this proposal:

File I/O

The File I/O approach, utilized by the standard Minecraft Java Edition server implementation, involves storing data in a dedicated folder containing .MCA files, each named according to the location of the 32x32 chunk region it represents:

Name of the .MCA file:
r.{X region}.{Z region}

Pumpkin can derive the region coordinate from a specific chunk location as follows:

// division
let region_chunk_x: i32 = player_chunk_x.div_floor(32);
let region_chunk_z: i32 = player_chunk_z.div_floor(32);

// bit shifts
region_chunk_x = region_chunk_x >> 5;
region_chunk_z = region_chunk_x >> 5;

The logic for interacting with chunks involves the following steps:

Certain steps can be optimized. For instance, by calculating the distance between the player and the edge of the region, you can determine if it is less than the render distance. If so, you can proactively fetch both region files simultaneously, reducing the need for subsequent region file loads.

Pros:

Cons:

Database Approach

The database approach utilizes LMDB, a B+Tree-based database, to store chunk information in tables, enabling logarithmic time retrieval and updates. This is the approach used by FerrumC project.

The logic for interacting with chunks involves the following steps:

The performance of this approach also depends on the implementation of the database, moreover, parallelization of reads and single-threaded writes over a dedicated threadpool.

Why LMDB ?

LMDB is considered a gold standard due to its exceptional performance and reliability, surpassing that of other options in the ecosystem. MDBX is an improvement over LMDB, but its Rust crate support is unfortunately declining.

Pros:

Cons:

Context

This proposal follows my previous experience assisting FerrumC in migrating from RocksDB (LSM structure) to LMDB (B+Tree structure), and aims to address and clarify any misconceptions surrounding the original proposal I made on Discord.

Conclusion

The MCA Region file approach is a proven and suitable solution for small to medium-sized servers, running on consumer or dedicated hardware, respectively. However, Pumpkin aims to achieve performance and efficiency that surpasses even the largest single-map servers, such as Folia and MultiPaper, which have successfully handled around 1000 simultaneous players. Pumpkin's goal is to exceed this benchmark, targeting player counts of over 1500 on a single dedicated hardware setup, a scale that would push the limits of file-based approaches. It is well understood that file-based approaches have inherent limitations when it comes to handling high parallel workloads.

This issue propose three options for Pumpkin:

Bryntet commented 5 days ago

Offers more flexible data modeling, enabling more efficient information retrieval (e.g., retrieving only chunk information without entities).

This is not something that is only achievable via a DB. I feel it very necessary that we don't conflate "using file I/O" with "using Minecraft's Anvil format".

I am still not convinced of the fact that we need a DB. But I think discussion around this is good.

I also think that supporting the original Anvil file format is a must, if we want to actually see adoption. But this should be togglable in config, to instead use Pumpkins format, whichever we choose to make it.

Supporting saving files in the Anvil file format should not be development priority though, our own world-saving method should be a the gold standard in our code base.

Asurar0 commented 4 days ago

This is not something that is only achievable via a DB. I feel it very necessary that we don't conflate "using file I/O" with "using Minecraft's Anvil format".

My assumption that we will be using MCA region files came from @Snowiiii words on discord:

ofc but i would prefer using mca not databases

that way people can easly download their worlds also way would you do if someone drops in an vanilla world?

I agree that we could create our own binary format, but that would mean reinventing the wheel. That being said, I'm happy to hear any ideas you have for improving the Pumpkin format.

I think we're still missing the bigger issue here, which is handling extreme parallel workloads with thousands of players. One key benefit of a single file mmap database is not just I/O efficiency, but also that the OS caches the file descriptor and its memory region, making memory mapping essentially 'free'. If we go with a file-based approach and try to use memory mapping, we'd still incur the cost of filesystem lookups and page faults, which will slow things down.

Snowiiii commented 4 days ago

Sounds like a cool idea. I'm not against using a database approche as long User's can export their World files. Im fine if you want to start implementing db storage, Im interested to see real Performance benchmarks between databases and just normal OS I/O (e.g. Windows, Linux).

Bryntet commented 15 hours ago

I agree that we could create our own binary format, but that would mean reinventing the wheel

Is the Anvil ".mca" format compressed at all?

If not, then our wheel could be significantly better.

Asurar0 commented 15 hours ago

I agree that we could create our own binary format, but that would mean reinventing the wheel

Is the Anvil ".mca" format compressed at all?

If not, then our wheel could be significantly better.

Yes it is. https://minecraft.wiki/w/Region_file_format#Payload

Snowiiii commented 13 hours ago

I agree that we could create our own binary format, but that would mean reinventing the wheel

Is the Anvil ".mca" format compressed at all?

If not, then our wheel could be significantly better.

It can be. It also can be not