boskee / Minecraft

Simple Minecraft-inspired program using Python and Pyglet
MIT License
207 stars 33 forks source link

Saving the World #42

Closed Nebual closed 11 years ago

Nebual commented 11 years ago

Forking off the points in #40. I want to write a better format for saving blocks.

Currently because no block really needs any form of memory (Doors, Chests, signs), we could get away with storing 3 int(16) for position, and then an uint(8) for id, with another uint(8) for the part after the decimal for wood. But how long will that be all the info that needs to be saved? Mojang uses a uint(4) called "damage" across a great number of blocks, which is a pretty efficient (if somewhat confusing) approach. Using space after the decimal in .id makes sense in python for subtypes (like wood or sign rotation), but it makes saving a hassle if we have to worry about ensuring the decimal part remains precise (since floats can lose precision, but a separate int doesn't)

Design decisions: Question 1: Do we store position for every block, or only for sectors? Assuming sector size of 8x8x8 (512 blocks) Option 1) Every block has a position. a. Sector is full of dirt: (16+16+16 position) + (8 id + 4 damage) = 60 bits * 512 = 30720 bits = 3840 bytes b. If only half the sector has blocks (say, on the surface, and the top half is air): 60 * 256 = 15360 bits = 1920 bytes.

Option 2) Every sector has a position, and then all 512 blocks in a sector are stored. a. Full of dirt: (0 position) + (8 id + 4 damage) = 12 bits * 512 = 6144 bits = 768 bytes. b. Half full of dirt: 12 * 512 = 6144 = 768 bytes. Theres no space advantage to emptyness here.

Assuming theres 12 bits of unique data for each block, 2 is the more efficient approach assuming at least 20% of a sector isn't air, (because 768/3840 = 0.2) If theres more info for each block (say, also a dataid attribute that points to a second table containing varchars, so they can have data of any length, for say Books), the ratio gets worse, but 2) still sounds like the only option.

Question 2: Do we store in separate files like Mojang's regions? Pros: Less chance of the whole world being corrupted if save is interrupted, Less memory Cons: perf? I'm not sure how various OS's prefer accessing hundreds of small files or dozens of larger or 1 single. Either way, we'll only load nearby sectors, but if theres only one file that we're seeking through it means we'll have a table of contents near the beginning saying the offset of where each sector is, as opposed to just looking at filenames for multifile.

Point 3: We need to store as little information as possible about every block, and have ways to store additional data about specific blocks where needed. Perhaps we could keep the main sector storage with just that 12bit approach (uint(8) for id, uint(4) for damage/subid), and have an additional storage file for overflow data organized with positions, like Question 1: Option 1.

Could discuss this in irc://irc.gamesurge.net/Pycrafter or whatever we're calling ourselves now.

BertrandBordage commented 11 years ago

What a presumptuous issue name ^^

BertrandBordage commented 11 years ago

Has 364d2a7467751b0dfad7d598ad0a9b0bbeb6ce2b solved this issue?

Nebual commented 11 years ago

364d2a7467751b0dfad7d598ad0a9b0bbeb6ce2b still saves a position for every block, not in sectors, in just two files, with no extended memory, though its closer to ideal than pickling everything.

It wouldn't be particularly difficult now to instead save to files named after region coordinates, and then skip writing individual block coordinates, which would make reading just a few sectors at a time (for an inf world) easy.