Amulet world format and cubic chunks

Barteks2x commented 4 years ago

Hello, I'm the developer of a cubic chunks mod.

I've recently been made aware of this tool and I would like to add support for my cubic chunks world format if it's possible, as so far there is no such tool that would support it in any way.

As far as I can tell, the current interface assumes that chunks only have x and z coordinate and are exactly 16x256x16. This makes it impossible to implement support for cubic chunks world.

Would it be possible to add support for cubic chunks worlds in the future? I can implement support for my specific format, and possibly do some other work if needed.

gentlegiantJGC commented 4 years ago

I will write a better response tomorrow and we can have a chat. I have been trying to make the chunk size not hard coded

gentlegiantJGC commented 4 years ago

Like I said yesterday, chunk size to begin with was assumed fixed but I have been trying to change that. I think there are still some things that are required before it is 100% possible. A few questions: What exactly does your mod change? Is the overall chunk format the same as forge and you just add sub-chuks above and below the world? I would like to implement support for forge first and then your mod can build upon that. If you would like to work on that you are welcome to try. (It isn't a major priority for us at the moment but it is something we would like to support) Is the size of the chunk your mod implements static across the world and dimensions? Is that size hard coded or dynamic in the level.dat or something? Is there a way for us to tell that your mod is being used based just on the world? I is the mca file format left unchanged or have you changed that as well? (I assume it is unchanged) We should also discuss how this would work with another mod that changes the chunk format in some way

Barteks2x commented 4 years ago

Note: I realize that this may be beyond the scope of this tool given the information below, and if that's the case - I will accept that. Given the answer above, the initial answer was based on underestimating what would be involved in supporting this.

There is one thing I want to state first - support for Forge worlds is NOT needed for cubic chunks support to be implementable as cubic chunks only really uses forge to facilitate loading itself and a way to load other mods (and some extra nice stuff like config API, but it's not essential to it's functionality). And as I said, I can be the one implementing support for this world format, as long as the amulet world format and API supports the necessary parts (more on that in "What is needed in Amulet to make this possible" section)

I also want to say ahead of time that the world format is a generalization of Anvil, but is NOT in any way compatible with anvil world format. But it should be fairly easy to read from python

So here is a more detailed description of what the mod actually does:

What the mod actually does

The idea of the mod, is to make chunk size be 16x16x16 instead of 16x256x16, and generate them dynamically based on player movement on Y axis. In practice, the way it's implemented, is by "virtualizing" a vanilla Chunk, and making it's 16x16x16 sections load and unload as needed. This also provides an obvious way to store per-column data like heightmap and column biomes.

The total world height limit is currently 2^30 blocks up and down, but it's theoretically changable in world NBT (WorldSavedData).

Also to avoid any further confusion, from now on I'm going to use the following terms:

Chunk - a 16x256x16 section of a standard Minecraft world. A chunk is by definition a column.
Cube - a 16x16x16 section
Column - 16x16 column of the world, potentially containing loadable/unloadable cubes

The world format

What determines whether a world is cubic chunks world

There are 2 places where "is this a cubic chunks world" is decided in NBT: one is "global", stored in level.dat and determines "does this world save need cubic chunks at all" (isCubicWorld in level.dat NBT in Data), and the other is per-dimension stored as WorldSavedData. Current 1.12.2 implementation of Cubic Chunks supports different dimensions with different heights, and some of them being vanilla dimensions, with normal world height and chunk loading (this is configurable in dimension exclusion list). Dimension data about it is stored in data/cubicChunksData.dat and it's json-like serialization (from MinecraftDev intellij plugin) looks like this:

"": {
    data: {
        minHeight: -1073741824
        isCubicChunks: 1B
        maxHeight: 1073741823
        compatibilityGeneratorType: "cubicchunks:default"
    }
}

There are also other heuristics that can be used which will follow from further description of the world format, but their result isn't guaranteed to match what the mod itself will do when loading the world in MC.

The world format

Note: I don't think this description is strictly necessary here as my question is more about amulet format supporting things necessary to make such idea possible. I can implement support for all described below myself once it becomes possible at all.

While normally, chunks are stored in region files inside world/region directory, cubic chunks worlds currently have 2 region directories:

region2d: This directory stores 2d region files (with per column data - biomes and surface tracking data). In general I would recommend to avoid touching surface tracking data from any external tool if at all possible. with the following name pattern: RX.RZ.2dr. The internals of those files are very similar to vanilla regions, with 32x32 columns per region. The only differences are:
- No timestamps in region headers (in retrospect, this was a mistake)
- Sector size is decreased to 512 bytes
- Unfortunately when writing my code it didn't occur to me to check vanilla does there and I'm stuck with it now, but I ended up with the indexing being flipped. Instead of index being calculated as x + z*32 it's calculated as x*32 + z
- "ext regions" - this is a variation on fixing the "overflowing chunk data over max supported threshold" bug, but I did it before forge fixed it in a different way. The idea being, that if a single chunk is too big to store it's size in region header, the chunk is moved to a separate file in RX.RZ.2dr.ext directory, with file name being the header index. In that case the main region file doesn't contain the column at all and instead ext region is used as a fallback.
The column NBT is actually very similar to vanilla chunk NBT, but with unnecessary data removed. This is done that way in order to allow vanilla data fixers to work on columns:
```
DataVersion: dataVersion
Level:
\- x: columnX (int)
\- z: columnZ (int)
\- v: 1 (byte)
\- InhabitedTime: time (long)
\- ForgeCaps: forge capabilities (optional, compound)
\- Biomes: biome data (byte array)
\- OpacityIndex: surface tracking data (byte array)
```

region3d: Contains the actual block data in 16x16x16 chunks. The region data works in similar way to region2d here, with the same sector size, but with the following differences:

a region file contains exactly 16x16x16 cubes
index is calculated as x*256 + y*16 + z
ext regions work in way analogous to that with 2d regions (again, I should have done it in the way forge does it, but I created that long before forge fixed it in it's own way)

Cube NBT data is intentionally kept very similar to vanilla Chunk NBT in order to keep data fixers working, with minor exceptions. The data structure is as follows:

DataVersion: dataVersion
Level:
\- v: 1 (byte)
\- x: cubeX (int)
\- y: cubeY (int)
\- z: cubeZ (int)
\- populated: true/false (boolean/byte)
\- isSurfaceTracked: true/false (boolean/byte, indicates whether this cube has been added to surface tracking data. This is always set to true by the mod and **exists purely for external tools to set it to false in order to make it unnecessary to deal with building valid surface tracking data**)
\- fullyPopulated: true/false (boolean/byte, can be always set to true without major consequences, used to determine if the cube and it's neighbors is far enough into population to be sent to clients)
\- initLightDone: true/false (boolean/byte)
\- ForgeCaps: forge capabilities, (compound, optional)
\- Sections: (array)
  \- 0:
    \- standard vanilla section compound, with NotEnoughIDs mod changes if the mod is installed
\- Entities: (array, standard MC NBT entity list)
\- TileEntities: (array, standard MC NBT tile entity list)
\- TileTicks: (array, standard MC NBT scheduled tick list)
\- LightingInfo: cubic chunks lighting info (compound)
  \- LastHeightMap: height map when the cube was last saved (int array)
  \- EdgeNeedSkyLightUpdate: a byte with flags about cube edges that need light updates (byte)
\- Biomes: optional byte array with cube specific biomes. Currently one biome is 2x16x2, but will be changed to 4x4x4 blocks

What is needed in Amulet to make this possible:

While it would be theoretically possible to convert a selected height range of a cubic chunks world into a tall version of vanilla-like world without cubic chunks, this would irreversibly lose an important piece of information: which exact cubes actually exist. Without this information, it's impossible to convert such world back into a cubic chunks format.

So the main thing that's needed, is being able to specify "does this 16x16x16 section actually exist". This can be accomplished in 2 ways:

making the chunk itself be 16x16x16 blocks in size, which while being an obvious way, also has many disadvantages
turning a chunk into a column in the sense I described at the beginning. This makes what was a chunk merely a container for cubes/sections. Despite being less obvious, it has many advantages: it's trivial to work with non-cubicchunks worlds this way, there is an obvious place to keep per column data, it requires significantly less changes

Since amulet has it's own one universal world format, the second approach seems like the best one to me if this is to be supported at all.

gentlegiantJGC commented 4 years ago

That is a lot of great information. I assume the numerical block IDs are not hard coded but use Forge's block map in the level.dat I would like to implement support for that first which wouldn't take to long and you would have to do anyway. Your logic can then build from that. The only major limitation that jumps out at me is that we are storing blocks as a 16x256x16 array. Extending that array to your max chunk size is just impossible. We either extend it dynamically to fit the existing data in your chunk and have it optionally extendable or change blocks to sparsely store chunk data which would be the most optimal way. We would also have to discuss how blocks in the negative direction would work because -1 is the last element in the array

Barteks2x commented 4 years ago

I assume the numerical block IDs are not hard coded but use Forge's block map in the level.dat

This is up to forge to handle but yes, this is the case simply by the fact that it's using forge. Without other mods installed, however, the IDs perfectly match vanilla.

The only major limitation that jumps out at me is that we are storing blocks as a 16x256x16 array. Extending that array to your max chunk size is just impossible.

I am aware of that, and this is the main part of my question as explained in the last section. The only way that would make any sort of sense, IMO, and that actually works, is to have an int->array map inside columns where each entry would represent a 16x16x16 section. This has several advantages even for vanilla worlds, a major one being lower space usage.

I am not very experienced in python, but under some assumptions it may be possible to keep existing API working while still introducing that change.

As explained before, the main problem with just extending world height in vanilla-like world format, is that information on which 16x16x16 sections actually exist is irreversibly lost and it's required in order to reconstruct a valid world again.

gentlegiantJGC commented 4 years ago

Yeh that is the solution I was thinking of as well. I don't like that we have to store the whole chunk when some of the sub-chunks don't actually exist. The issue is finding a way that is as compatible as possible with what we have

Barteks2x commented 4 years ago

At this stage in development, is keeping API backwards compatibility a priority? This project so far seems to be fairly early in development. Depending on how python works and the API, it may or may not be possible to keep existing API as is (as I said, I have very little experience in python). Is it intended to keep the amulet world format compatible in some way, or are changes to the format allowed? I assume it has to be allowed in some way.

gentlegiantJGC commented 4 years ago

At this point it is basically just internal code using it but I would like it to keep the API as close to a numpy array as possible. The internal representation is only internal so compatibility isn't a major concern. If there were lots of user plugins using it then it might be. That is why I am happy to try and fix it now.

Barteks2x commented 4 years ago

As far as I can tell, the actual block array for a chunk is implemented in https://github.com/Amulet-Team/Amulet-Core/blob/5fadf0da825c0009d9f1925a063fd49856c1ec3d/amulet/api/chunk/chunk_array.py.

Looking at it, I don't see some operations being implementable in a way that actually makes complete sense. For example fill can't possibly fill the entire 16x16x[huge number] volume. A way that makes more sense is that it fills only the 16x16x16 sections that actually currently exist, but that could result in some counterintuitive behavior. resize doesn't seem like something that could be practical to implement with such internals either.

But aside of that, it seems like it should be possible to make it work. Is this something I would be able to help with or is it better for me to wait?

gentlegiantJGC commented 4 years ago

That is where the block array is defined but that is just the generic API of a numpy array. We did that so that the blocks changing can notify the chunk that it has changed. Beyond that it is just a numpy array

gentlegiantJGC commented 4 years ago

My proposed implementation is that blocks be a class containing a dictionary like you said that maps integer subchunk id to the numpy array. We then need to work out how to get and set data to the different arrays. Ben says he will have a little play around

gentlegiantJGC commented 4 years ago

Do we feel that it is a reasonable assumption that sub-chunks will always be 16x16x16? I am no too much into the modding scene and don't know if there are any mods out there that mess with that.

Barteks2x commented 4 years ago

I don't think any mod would possibly even try. I don't think there are even any minecraft-like games that use sizes that can't be split into 16x16x16 sections evenly (terasolgy is a bit weird there, in that it has 32x64x32 chunks, but it can be split into 16x16x16). Powers of 2 are just too nice to not use, and 8x8x8 is too small.

As it is, I don't think there are any other mods that do such a significant change. The only other thing I'm aware of that comes close to changing world format is JEID for 1.12.2, which makes chunk NBT format closer to that of 1.13.x+ (not really compatible with cubic chunks).

gentlegiantJGC commented 4 years ago

The proposed solution is in this pull request. https://github.com/Amulet-Team/Amulet-Core/pull/71

We are storing 16x16x16 numpy arrays in a dictionary under their sub-chunk id.
This allows only the existing sub-chunks to be defined (we currently strip out full air chunks when saving. Is this going to be a problem? The behaviour can be changed in your own chunk interface)
getattr and setattr have been implemented to interact with the data in a similar way to numpy.
sub_chunks can be used to see which sub-chunks are defined
contains has been implemented to quickly check if a sub-chunk is defined
get_sub_chunk gets the sub-chunk or errors if it does not exist
get_create_sub_chunk gets the sub-chunk and creates it as air if it does not exist
There is a speed impact due to this making it quite a bit slower than a single numpy array but it is still fast enough.

In relation to getattr and setattr the normal behaviour in python is that negative indices work from the top end of the list/array but this behaviour has been changed to support negative block locations. blocks[:, :, :] if no start and stop values are defined in the y slice it caps to the size of the normal vanilla chunk so that is equal to blocks[:, 0:256, :] blocks[:, -10:300, :] will return an array containing blocks from negative 10 (below the world) to 300 (above the world). This is returned as a normal numpy array with the whole area defined so it is advised not to access large slices at once due to memory concerns.

The rest of amulet needs to be changed to support this but I think this is a change for the better.

Barteks2x commented 4 years ago

This looks like it would work. Removing armory sub chunks when saving is going to be something that will need to be changed in my format (as otherwise those empty sections would be regenerated, going back to losing information on which sections exist).

One more thing I can't quite figure out from looking at the code (because I don't know python that well yet) is will it throw an error of you try to so something like blocks[:, -10:300, :] for a chunk where only vanilla height range is defined (like get_sub_chunk)? Or does it assume its empty? Or does it create the missing parts?

While defaulting to 16x256x16 makes some sense it does bring up a possibility of issues in code using it that only occur for non standard world heights. Aside of it being an error I don't see a better alternative here so maybe it would just have to stay that way.

And again sorry for my very limited understanding of python code for now, I mostly write Java and python is still fairly new to me.

gentlegiantJGC commented 4 years ago

The behaviour of getattr is that if the sub-chunk doesn't exist it will just assume there is air there. It shouldn't throw and error. Previously it was generating air filled sub-chunks in getattr but that was slowing things down and taking up memory when not required. That behaviour still remains in setattr though. If you call get_sub_chunk and the chunk does not exist it will throw an error but all calls to that check it exists first.

The vanilla behaviour of sub-chunks not existing is that they are left as they are until a block is placed in them. If the whole chunk does not exist then it is all regenerated. This saves storing a sub-chunk filled with just air. If that behaviour needs to change you can just modify it in your save logic.

Barteks2x commented 4 years ago

What I need is some way to distinguish "all air" and "doesn't exist" in memory. In vanilla it's not an important distinction because sections aren't generated individually. The existence of a chunk marks the existence of all sections in it.

Without a dedicated way to distinguish those states, I can only have either memory efficiency (not keeping air in memory) or correctness (distinguishing air vs doesn't exist). While a separate way to distinguish it isn't strictly required for it to work at all. It would mean that optimizations like "removing all air sections in the fly" would become a breaking change for this use case.

gentlegiantJGC commented 4 years ago

For the actual chunk if you do world.get_chunk it will throw ChunkDoesNotExist if the chunk doesn't exist. If the chunk exists but does not have any blocks in it that will return a Chunk class and when you inspect chunk.blocks.sub_chunks it will be empty. That method returns the keys in the dictionary under which we store the actual sub-chunks. (a sub-chunk being a 16x16x16 region). If a sub-chunk does not exist it will not be in that dictionary and thus will not be in chunk.blocks.sub_chunks. If a chunk exists and is filled with air it will be stored in the dictionary. For vanilla we only throw away the air filled sub-chunks when serialising to disk and they will remain in the Chunk class. You can change the serialising functionality however you want.

The existence of a chunk marks the existence of all sections in it. I don't know too much about Java but I know in Bedrock only the filled sub-chunks are saved to disk. I thought this behaviour is the same in Java

Barteks2x commented 4 years ago

Yes this is the same in vanilla. What I had in mind was that "chunk exists" automatically means that all the sections in it ezist, regardless of whether their data exists.

In cubic chunks however, conceptually, a chunk is 16x16x16 and contains only one section. For empty sections, the section data isn't written to NBT but the chunk still has to exist for several reasons (lighting data, entities, mod data, cube capabilities) and current implementation always saves them.

While empty sections being stored as all air in memory makes sense it negates the intended memory savings in the first place. And not being able to distinguish all air from not generated makes this optimization probably impossible (unless the 16x16x16 block array is implemented as dynamically switching between normal and "sparse" array, which could be done later and as far as I can tell it could be done without breaking existing api).

So my question is, if air sections are still in memory, are there any plans for some way to not actually store all of that data in memory?

gentlegiantJGC commented 4 years ago

Oh okay. The way we have implemented it there is only one chunk at each cx, cz combination with multiple sub-chunks for blocks inside. From what you are saying you are storing one chunk per sub-chunk each of which only stores the blocks of that one sub-chunk. Are you storing entities and block entities per 16x16x16 region rather than storing one array that covers the whole vertical chunk.

From what I understand in the de-serialising and serialising code you should be able to switch from your format of multiple "chunks" per chunk to our single chunk format. There will be some overhead there but I think it should be possible. The entities and block entities would need merging into one list and then splitting again.

In your format is it possible for the chunk to exist but the blocks inside to not exist?

Barteks2x commented 4 years ago

From what I understand in the de-serialising and serialising code you should be able to switch from your format of multiple "chunks" per chunk to our single chunk format. There will be some overhead there but I think it should be possible. The entities and block entities would need merging into one list and then splitting again.

Yes that is correct.

In your format is it possible for the chunk to exist but the blocks inside to not exist?

Yes. It's entirely possible for me to expand it into an air section anyway.

My point is that this design doesn't seem to allow the possibility of "don't store the blocks in memory, but keep the information that this section exists". Unless this would be done as another"wrapper" for numpy array similar to previous chunk array

gentlegiantJGC commented 4 years ago

So my question is, if air sections are still in memory, are there any plans for some way to not actually store all of that data in memory? Do you mean RAM or disk when you say memory. I am reading it as you meaning two different things.

In your disk format if a chunk exists but the section data does not you would take the data that is there and put that in a nice location in the Chunk class and leave the key in blocks undefined. If the section data does exist you would just put it in the Blocks class. If you want you could check if it is air and not put it in the Blocks class if that is the functionality you want

When saving back you have full control on where the data goes. We do some translation stuff to turn the blocks from our universal format into the version format and give you a Chunk class with the blocks in the version format. Based on the data in the Chunk class you can save your chunks with section data inside or not upon whatever criteria you like.

What I was saying about dropping air sub-chunks earlier is that last step. In vanilla if a sub-chunk is all air it doesn't need to exist on disk. When saving we check if the sub-chunk is all air and don't save the sub-chunk but still save the chunk itself even if there are no sub-chunks defined in it.

We can do that with vanilla because the sub-chunk won't be recreated if the chunk itself exists. I think you said that if one of your chunks doesn't exist the terrain would be recreated so you would need to save the shell of the chunk with no blocks data so that that doesn't happen.

I would probably suggest storing a set in chunk.misc of your chunks that exist so that if a user deletes block data in Amulet you know that that chunk should be created even if there is no data to put into it.

Barteks2x commented 4 years ago

My on disk format doesn't really matter at this moment in time. When I say "in memory" I do mean ram.

Me storing a separate set somewhere would work only if amulet doesn't in any way allow users to delete chunks or add new sections (not being able to add new sections could be a bit limiting in my case).

If deleting chunks is something that is intended to be allowed, that would be a bit of a challenge to implement in case of cubic chunks.

Also I completely forgot that this is python and not Java and I actually can put extra data in existing objects.

Barteks2x commented 4 years ago

There is another question I wanted to ask: is support for my world format something that I would have to add in a PR or would it be some sort of addon/plugin?

gentlegiantJGC commented 4 years ago

I do mean ram If the blocks are not defined in your chunk you can just leave them undefined in blocks. If a user sets an area as air the sub-chunks will be created and stored. There isn't much we can do there.

My thoughts on the set were that you would populate it when loading the column. That would define which chunks are defined in that column (all chunks in the column would be loaded at once)

When saving you would use that set as a guideline of which chunks NEED saving but there may be more and you will have to look in the Chunk class for data that needs saving and work out the real list of chunks that need creating. That way if data in a chunk gets deleted you know to still create it and if a user places blocks outside chunks already defined you will have to create a shell chunk with the required data.

That set will be public and there isn't much we can do about it. If a user wants to experiment adding values to it or taking values out they can. You should try and account for it when saving.

I think we wouldn't support deleting individual chunks in your format but we could support deleting the whole column if that works with your format.

There is another question I wanted to ask: is support for my world format something that I would have to add in a PR or would it be some sort of addon/plugin?

It would be nice to have official support for it if you are willing to contribute the code.

Barteks2x commented 4 years ago

That looks like it would work.

I can PR it but that's probably going to need more thorough review. Writing initial code for it doesn't strictly need forge support so I can try start with the current PR and submit my changes after its merged. This would also uncover any hidden issues in the PR if there are any.

gentlegiantJGC commented 4 years ago

That would be great. I have mostly gotten it integrated but there are still some places I have missed. I managed to load a world and display it in the renderer

Barteks2x commented 4 years ago

Right I am having some setup issues (probably due to the unusual linux distribution I'm using) so I'm not sure when I will be able to get everything set up.

Update: looks like I have system numpy compiled without lapack, but the venv setup wants to have one with lapack, and it fails because I dont have that system package.

Barteks2x commented 4 years ago

I think I got it mostly set up but I still have some issues and confusion (the biggest one being "I can't figure out hot to run it"). Is there some place where I can ask more specific questions regarding setup instead of spamming here?

gentlegiantJGC commented 4 years ago

We use the MCEdit discord server if you want to use that and discuss there. https://discord.gg/BTm6jnf You can also follow our development discussion there as well.

Amulet-Team / Amulet-Map-Editor