[Suggestion] List of all generated chunks

k-a-mendoza commented 9 years ago

A plugin that I developed for bukkit requires me to create large structures across many chunks. These chunks have to have already been generated (whether they are loaded or not) and devoid of vanilla ores. I've also gotten requests that I make this plugin compatible with already generated worlds. To make doing this easier I request that the following methods be added to the world class:

/*
* Returns whether a particular chunk has already generated structures, 'the layer', and 'decorations' 
* regardless of its load status.
*/
public boolean isGenerated(int x, int z)

/*
* Returns a List of all the chunk coordinates that have already been 'decorated' regardless of load 
* status. The list is a snapshot of the world as it was when the method was called. String can be 
* replaced by other objects more suited to coordinate retrieval, but otherwise only contain the x & z 
* coordinate data of the chunks within the world. 
*/
public List<String> getWorldChunkList()

Edit: clarified what I meant by 'generated' .@gratimax pointed out that during chunk generation structures are generated first, then the 'layer' (terrain landforms), and then 'decorators'.

Lunaphied commented 9 years ago

Reformatted :smiley: This seems like a relatively reasonable feature, however getting a list of all generated chunks might be, expensive.

public List<ChunkPos> getPopulatedChunks(World world);

Might make more sense too.

k-a-mendoza commented 9 years ago

Expensive as in the overhead required to access that info or the amnt it would slow things down by invoking this method? The way I envision getPopulatedChunks()/getWorldChunkList() being called is at most once during the server's lifetime.

ryantheleach commented 9 years ago

Isn't there a difference between generated and populated? (at least there was in bukkit and minecraft)

k-a-mendoza commented 9 years ago

Good point. I think I recall generated meaning that the chunk had its terrain 'carved' out, and populated being that the chunk has been populated already by a set of block populators. In any case I'd like to be able to identify chunks that are done being messed with, and get a list of chunks that part of my plugin has to go through and edit.

ryantheleach commented 9 years ago

Also new in 1.8 is the code that retrogen's the new stone types, as well as the underwater structures.

maxov commented 9 years ago

@ryantheleach Good point. Last time I looked in the Minecraft source, there were two steps to generating terrain:

Generate structures that are populated randomly. (called population)
Generate the actual terrain with some noise function manipulation. (called generation)

Minecraft actually does step number 1 first because big structures like underwater temples, underground mineshafts, and strongholds take up more than one chunk and their size cannot be determined without generating them(they're essentially simple cellular automata). So Minecraft has an upper bound on the size of these structures and when generating a chunk it checks if any big structures are within that radius. Those "nearby" structures are generated first and step 2 comes per-chunk.

As such the final step of generating terrain is generation, not population. Chunks stop being messed with after that.

nightpool commented 9 years ago

@gratimax Most populators absolutely aren't before terrain generation--Populators are mostly things like trees, caves, grass/flowers, ores and etc. I don't know about huge structures but most populators come after terrain generation.

On Mon Dec 01 2014 at 7:49:18 PM gratimax notifications@github.com wrote:

@ryantheleach https://github.com/ryantheleach Good point. Last time I looked in the Minecraft source, there were two steps to generating terrain:

Generate structures that are populated randomly. (called population)

Generate the actual terrain with some noise function manipulation. (called generation)

Minecraft actually does step number 1 first because big structures like underwater temples, underground mineshafts, and strongholds take up more than one chunk and their size cannot be determined without generating them(they're essentially simple cellular automata). So Minecraft has an upper bound on the size of these structures and when generating a chunk it checks if any big structures are within that radius. Those "nearby" structures are generated first and step 2 comes per-chunk.

As such the final step of generating terrain is generation, not population. Chunks stop being messed with after that.

— Reply to this email directly or view it on GitHub https://github.com/SpongePowered/SpongeAPI/issues/256#issuecomment-65166066 .

maxov commented 9 years ago

@nightpool I guess I'm using the wrong(or old) nomenclature. I understand that smaller things, as you mentioned, trees, caves, grass and flowers are populated after the terrain, but I was talking about big structures.

nightpool commented 9 years ago

Its possible I'm off base on naming too, most of my experience with world gen is from the beta era. On Mon, Dec 1, 2014 at 8:00 PM gratimax notifications@github.com wrote:

@nightpool https://github.com/nightpool I guess I'm using the wrong(or old) nomenclature. I understand that smaller things, as you mentioned, trees, caves, grass and flowers are populated after the terrain, but I was talking about big structures.

— Reply to this email directly or view it on GitHub https://github.com/SpongePowered/SpongeAPI/issues/256#issuecomment-65167127 .

maxov commented 9 years ago

@nightpool I took a look through the source to confirm the names. The things you're talking about are called 'Decorators' and 'features', not populators. The things I'm talking about are called 'structures'. The order goes structure -> layer(actual terrain) -> feature as expected.

k-a-mendoza commented 9 years ago

That actually clears up a lot @gratimax thanks. I'll Edit my original suggestion to clarify what I meant.

mikeprimm commented 9 years ago

Speaking from my Bukkit and Dynmap experience, I don't think you want to consider getting a list of ALL chunks, generated or populated, as a single, synchronous world call - on large servers, it'll kill you, as the data is spread across potentially thousands of Anvil (.mca) files (each of which needs to be opened in order to answer the 'is it generated' question, and 'populated' requires reading data stored in the compressed, per chunk, NBT data - which doesn't support being efficiently read for just a single field, so you're gonna wind up uncompressing and unmarshalling the whole chunk data to just get the field - see http://minecraft.gamepedia.com/Chunk_format for details). isGenerated(int, int) can be done reasonably efficiently, if you have it check for the existence of the .mca file and then just load the offset map at the beginning of the file to see if the corresponding offset is nonzero. Even so, doing just that en-mass for a whole world will still be a lag-fest (the main world on the server I admin, WesterosCraft, has 21000 .mca files, each of which would need to be opened and have the first 4k of data read in order to provide the identities of the chunks - of which there are going to be about 10,000,000 returned).

Lunaphied commented 9 years ago

Numbers to back up my prediction. Needless to say I'm in favor of no methods that when called freeze the server. Registering hooks that are called at various times during chunk generation and population seems smarter. Like ChunkGeneratedEvent, ChunkPopulatedEvent and ChunkDecoratedEvent.

k-a-mendoza commented 9 years ago

I see. How bout keeping a separate HashMap of all chunk coordinates that have been Decorated from the time of server creation? I suppose I could just make one in my own plugin that keeps track of all the decorated event calls if they are implemented. Having a API based isGenerated(int, int) function that allows the querying of specific chunks for their existence plus those three events mentioned above should give me the functionality I've been looking for, although it looks like converting entire words will not be an option.

mikeprimm commented 9 years ago

If we DO want to accomplish something like this, though, it'd be possible to do it with some sort of asynchronous iterator type of response: the I/O for the MCA fils doesn't need to be done on the server thread, and the results could be delivered via a multistage callback from 'off-thread' (e.g. the caller provides an object implementing a callback interface, which is then called with partial results repeatedly as the request is processed). It could be done pretty well for the 'isGenerated' result - with results returned once for each MCA file processed (for example). Doing 'isPopulated/Decorated' that way isn't necessarily worth it, as the work is almost all per-chunk, so doing all the chunks in one region file at once isn't necessarily going to help much for efficiency, so just having a 'isPopulated(x,y)' method that could be used once folks already have the bulk isGenerated() results is probably just as good (since, ultimately, reading the Populated NBT field for every chunk is going to result in the WHOLE MCA file being read...).

The upside of the bulk isGenerated thing is that, presently (in Bukkit and elsewhere), there isn't a good way to find out all the chunks that exist in a world - polling all the potential xz combinations isn't realistic, and actually causes empty .mca files to be generated. For Dynmap, I actually check the presence of the .mca files and parse the names to know where existing chunks MIGHT be (so that I avoid creating empty ones by asking).

ryantheleach commented 9 years ago

So basically a chunk iterator, that can accept a filter, that doesn't generate nonexisting chunks?

This would seem to make it clear that getting all the generated chunks can and will iterate over all chunks, provides a means of filtering arbitrary chunks, and lets people using java8 use streaming API, while allowing the implementation make optimizations.

One thing I don't understand with filters, is it possible say to filter all chunks by their x and z co-ordinates, and detect that the chunk doesn't need to be loaded for that particular filter to resolve? Or would information that is available like that be a different object then chunk?

ST-DDT commented 9 years ago

@ryantheleach There are some ways to acomplish this kind of filter. First of all World.getGeneratedChunks(minX, minZ, maxX, maxX) using int values as parameters to allow limiting the chunk before even thinking of them (code wise).

Otherwise you need a mayAccept(x,z) : boolean with the downside, that you have to calculate that possible chunk position somehow, but you may safe IO if you don't want it loaded. After that you can use the actual filter method accept(x,z,Chunk) and check whether you want that chunk in your list.

But if you add such a method, please consider adding either a World.getRegionContainer() : File method or/and a World.getRegions() : List<RegionLoc> method. Because this in combination with World.getGeneratedChunks(RegionLoc) : Iterable<ChunkLoc> will drastically reduce the performance impact, because bad implementations are less likely. In addition to that this may help plugins like dynmap because they won't have to search for all existing region files themselves.

Sumary

World
- getFolder() : File (world root, with regions and players... folders in it)
- getRegionContainer() : File
- getExistingRegions() : List<RegionLoc> (very few IO, only read filenames)
- getRegionFile(RegionLoc) : File (optional, no IO, gets the 0_0.mca so that plugins can easyly delete them, TBD)
- hasChunks(RegionLoc) : boolean (few/some? IO)
- getExistingChunks(RegionLoc) : Iterable<ChunkLoc> (some more IO, open region file to check the existing chunks in there)
- getExistingChunks(RegionLoc, Filter<ChunkLoc>) : Iterable<ChunkLoc>
- getExistingChunks(int, int, int,int) : Iterable<ChunkLoc>
- getExistingChunks(int, int, int,int, Filter<ChunkLoc>) : Iterable<ChunkLoc>
- getChunk(ChunkLoc) : Chunk (Heavy IO, because i guess the entire chunk is loaded)
- getChunkData(ChunkLoc) : ChunkData (Potentially heavy IO, only data loaded)
- isChunk____(ChunkLoc) : boolean (Potentially heavy IO, only data loaded)
Chunk (extends ChunkData?)
- getRegion() : RegionLoc (no IO)
- getChunkData() : ChunkData (no IO, because i guess it is already in the RAM)
ChunkData
- isGenerated() : boolean (no IO)
- isPopulated() : boolean (no IO)
- getChunkState() : ChunkState (either of NOTTHERE, POPULATED, GENERATED, DECORATED, (LOADED?))
- getChunkLoc() : Vector2i (no IO)
- getMinBlockLoc() : Vector3d (no IO)
- getMaxBlockLoc() : Vector3d (no IO)
- getLoc(AxisDirection, AxisDirection) : Vector3d (no IO)
Filter
- accept(D) : boolean (executed before any IO for D is made, if possible)

Executing the filter after the IO has been made has no befenfits.

mikeprimm commented 9 years ago

@ST-DDT - I like the idea of being explicit on the support for the RegionFile : it definitely allows for a more practical take on both enumerating what chunks might exist in a more fine-grained fashion, as well as being more caching-friendly (it wouldn't be unreasonable to keep and internally maintain a cache of the RegionLoc values for a given world - avoiding I/O to find the existing files beyond the first check - since, even for large worlds, these number in the thousands to tens-of-thousands, versus the millions to tens-of-millions for the chunks). My point with the callback style iteration was to exploit the fact that the actual I/O of chunk data is already handled by dedicated threads that are not the caller's thread, so the choice to read data from RegionFiles is going to either involve the caller blocking and waiting for the I/O to be handled by the I/O thread, or we could make it an asynchronous request where the result are delivered either by a callback from the I/O thread or queued back to the server or something else that avoids the badness of having a blocking call on the server thread. Anything involving synchronous disk I/O during calls from the server thread is generally bad for scaling.

Mumfrey commented 9 years ago

@ST-DDT and @mikeprimm I think putting anything in the API which makes assumptions about the underlying format is a really bad idea, especially for other implementors and potential future storage format changes. It should also definitely not be assumed that the underlying storage format is based around files, it should be perfectly feasible to store chunks in an RDBMS for example.

For that reason any proposed getRegionContainer() should return an interface which can represent any kind of storage container for a region (not even necessarily a 1:1 mapping of regions to containers) rather than returning java.io.File

@ST-DDT, your proposed methods are very inconsistent, you use RegionLoc in some places, vectors in others, and integers elsewhere. Why not use the same argument/return types and be consistent, especially with reference to the getExistingChunks overload which takes integers, that would seems completely pointless.

mikeprimm commented 9 years ago

@Mumfrey Chunks, by their nature, are already such an assumption - there is nothing magic about 16 x 16 division of the block domain: it's an artifact of implementation that is at least as likely to change as the division of the world into 32 x 32 chunk regions known as region files (much less the fact that Worlds have always had a folder associated with them in Bukkit and all other modding implementations I've seen). Historically, chunks have been messed with more by mojang than the region files (the world heigh t change and introduction of 16 x 16 x 16 chunk sections being the most notable - only change to region files was renaming mcr to mca to indicate the migration to the newer chunk format). In any case, a change in storage format could emulate the notion of 16 x 16 block chunks and 32 x 32 chunk regions with no significant effort.

Mumfrey commented 9 years ago

I agree that chunks are just as crappy an assumption but we're kind of stuck with it. I was primarily disputing the use of java.io.File when getting chunk container as while chunks (or more properly ChunkColumns are a fact of life for minecraft because of the protocol and lots of other assumptions made throughout the engine), it should be up to the underlying implementation to decide on the actual storage on disk, and if Glowstone decides it wants to pack 4 chunkcolumns to a file or store them in an rdbms then it should be free to do so. Plus it helps in the future given that we know the level format is going to change if we provide extra abstraction now, it doesn't cost us anything but has the potential to be helpful in the future so I don't think it's an unreasonable suggestion.

ryantheleach commented 9 years ago

So what about returning an iterator/calling a callback of blockvolumes representing an "atomic" region of generated land.

This would in minecraft default to being 16x16x256 but would allow for change in the future?

At worst it could end up being block by block (un-recommended) or as big as the entire world.

nightpool commented 9 years ago

I would love to have some sort of off-thread interface to unloaded chunks in Sponge. We probably don't need to expose ACTUAL region files, just a smart World.getUnloadedChunks() : Iterator kinda deal. And then maybe some sort of filtering that would keep us from opening ALL of the files if we only need some of them. On Fri, Dec 5, 2014 at 11:44 AM Ryan Leach notifications@github.com wrote:

So what about returning an iterator of blockvolumes representing an "atomic" region of generated land.

This would in minecraft default to being 16x16x256 but would allow for change in the future?

At worst it could end up being block by block (un-recommended) or as big as the entire world.

— Reply to this email directly or view it on GitHub https://github.com/SpongePowered/SpongeAPI/issues/256#issuecomment-65816870 .

ST-DDT commented 9 years ago

The int vs ChunkLoc stuff was indeed not consistent. I'll fix that.

When thinking of regions i thought of the following. Chunks are already in the API, so they propably stay in there. Regions are chunk containers. They are required for persisting data on the hard drive. So there will always be some kind of region . If you think that region is not the right term i can call it ChunkContainer. If you store your entire world in a database you will have multiple region tables or at least some kind of column for indexes. If the entire database(table) is a region that there is just one region at all. If there is no persistence at all then you will have one region of infinite size, but as you mentioned no File, so i will use StorageContainer as a way to support all those different storage types. I have no intentions in stating anywhere that a chunk or region have a specified size.

1 Region >= 1 Chunk (C ε Ra => B ε Ra for all B ε C)
if chunk is part of Region A then its not part of Region B (C ε Ra and Ra != Rb => C !ε Rb => B !ε Rb for all B ε C)

(B = block, C = chunk, R = region, ε = element)

I hope that this assumptions are acceptable.

Here is my updated list of methods: (I did not add the async to the method names yet to avoid confusion at this stage)

Sumary

World
- getStorage() : StorageContainer (world root, with regions and players... folders/tables in it)
- getRegionsStorage() : StorageContainer
- getRegionStorage(RegionLoc) : StorageContainer (optional, no IO, gets the 0_0.mca so that plugins can easyly delete them or write meta data, TBD)
- hasRegion(RegionLoc) : boolean
- getExistingRegions() : List<RegionLoc> (very few IO, only read filenames, impls may cache this)
- deleteRegion(RegionLoc) (optional, safely unloads all chunks in the given region and then deletes the entire region)
- getExistingChunks(RegionLoc) : Iterable<ChunkLoc> (some more IO, open region file to check the existing chunks in there)
- getExistingChunks(RegionLoc, Filter<ChunkLoc>) : Iterable<ChunkLoc>
- getExistingChunks(ChunkLoc, ChunkLoc) : Iterable<ChunkLoc>
- getExistingChunks(ChunkLoc, ChunkLoc, Filter<ChunkLoc>) : Iterable<ChunkLoc>
- getChunk(ChunkLoc) : Chunk (Heavy IO, because i guess the entire chunk is loaded)
- getChunkData(ChunkLoc) : ChunkData (Potentially heavy IO, only data loaded)
- isChunk____(ChunkLoc) : boolean (Potentially heavy IO, only data loaded)
- deleteChunk(ChunkLoc) (optional, safely unloads the chunk and then deletes the chunk)
RegionLoc
- getRegionX() : long
- getRegionZ() : long
- getMinChunkLoc() : ChunkLoc
- getMaxChunkLoc() : ChunkLoc
- getChunkLoc(AxisDirection, AxisDirection) : ChunkLoc
- getMinBlockLoc() : Vector3d
- getMaxBlockLoc() : Vector3d
- getBlockLoc(AxisDirection, AxisDirection) : Vector3d
Chunk (extends ChunkData?)
- getChunkData() : ChunkData (no IO, because i guess it is already in the RAM)
ChunkData
- getRegionLoc() : RegionLoc
- getChunkLoc() : ChunkLoc
- isGenerated() : boolean (no IO)
- isPopulated() : boolean (no IO)
- getChunkState() : ChunkState (either of NOTTHERE, POPULATED, GENERATED, DECORATED, (LOADED?))
ChunkLoc
- getChunkX() : long
- getChunkZ() : long
- getMinBlockLoc() : Vector3d
- getMaxBlockLoc() : Vector3d
- getBlockLoc(AxisDirection, AxisDirection) : Vector3d
StorageContainer
FileStorageContainer extends StorageContainer
- getFile() : File
Filter<D>
- accept(D) : boolean (executed before any IO for D is made, if possible)

Does this list fits better to your expectations? Or did i miss anything?

nightpool commented 9 years ago

@ST-DDT TBH I don't see why we need a region API. I could just as easily store all my chunks in a relational database and there would be no concept of a region, because there would be no grouping of chunks above just "world". Instead we should just transparently allow for some interface over unloaded chunks (ChunkData, for example)

ST-DDT commented 9 years ago

The region API should prevent that the authors have to find a hacky way to get all region files to make asumptions where chunks may be. Checking and storing newly created/loaded chunks will only work for plugins that are installed before the world is created. Otherwise you will have "black" spots on your chunk list. (For chunks that have not been loaded yet) I have some mca s that have not been changed for months, and i don't want them to not show up in any plugin for any reason.

nightpool commented 9 years ago

@ST-DDT I meant an API that would allow you to access some details of unloaded chunks off-thread (for example from region files, but not necessarily) without pulling the whole chunk in to memory.

On Fri Dec 05 2014 at 6:15:08 PM ST-DDT notifications@github.com wrote:

The region API should prevent that the authors have to find a hacky way to get all region files to make asumptions where chunks may be. Checking and storing newly created/loaded chunks will only work for plugins that are installed before the world is created. Otherwise you will have "black" spots on your chunk list. (For chunks that have not been loaded yet) I have some mca s that have not been changed for months, and i don't want them to not show up in any plugin for any reason.

— Reply to this email directly or view it on GitHub https://github.com/SpongePowered/SpongeAPI/issues/256#issuecomment-65870002 .

ST-DDT commented 9 years ago

@nightpool You are right. It should be (somehow) bound to the internal threads, but if the data are extracted from the chunk/region file they should be accessible outside of the chunk/region thread scope. (ReadOnly) Shall i mark any of those methods for more discussion or add a comment/notice?

k-a-mendoza commented 9 years ago

@ST-DDT I like your updated method list. I think the only thing is missing is some chunkEvents such as chunkGenerateEvent, chunkPopulateEvent, chunkDecorateEvent.

ST-DDT commented 9 years ago

So here is my PR https://github.com/SpongePowered/SpongeAPI/pull/283 which should contain all features mentioned here. If i missed a certain feature please tell me.

gabizou commented 9 years ago

A couple of issues I see with some things:

Async access via a synchronous World shouldn't be allowed, rather, utilizing WorldStorage or ExtentStorage to allow async access to getExistingFoo would make sense.
We shouldn't leak potential implementation interfaces such as FileStorageContainer, much like the Persistence API provides an abstract interface for API. It shouldn't matter to plugins whether the storage is backed by File, RDBS, etc.
We don't need Filter as Predicates already solves that.

ST-DDT commented 9 years ago

The "async" was more like having a thread running independently from the main/world thread, but actually use the main/world thread to load that. The chunk data interface was added for that so you don't have real access to the chunk and can not set some blocks async. But you could get access to the chunk at get a snapshot from it. Currently its just a naming to mark it special. (missing knowledge on my side)
The StorageContainer can be also implemented the Void way. @mikeprimm thinks its a nice to have and may help plugins like dynmap or WorldBorder or minebackup doing their job easier. The PR contains all stuff that was considered usefull at some point of this discusion. It may not be needed any more because i also added other ways to achieve their goal. But i don't want to hide it either, because i'm not the one who created those plugins. So i'll forward the question to them.
The filter will be replaced by Precidates tomorrow/asap.

k-a-mendoza commented 9 years ago

@ST-DDT looks good to me! although I was curious as to what World.getExistingChunks() would return? an iterator? List of chunk points?

ST-DDT commented 9 years ago

@El-minadero There are actually to ways:

Iterable<Chunk> getExistingChunks(Vector2i, Vector2i);
Iterable<ChunkData> getChunkDataAsynchronously(Vector2i, Vector2i, Predicate<Vector2i>);

The first returns actual chunks. The second one gives you information whether there are chunks or not and it also can be used to get the real chunk.

k-a-mendoza commented 9 years ago

@ST-DDT Ok looks great to me! Thanks for all the work! You'll get an honorable mention when I release my plugin :D!!!

ST-DDT commented 9 years ago

You should better mention the team because they do most of the Api stuff. And the pr its not merged yet. Any other features for worlds etc. You would like to see?

Lunaphied commented 9 years ago

@SpongePowered/Developers

ryantheleach commented 9 years ago

I think it would be better to return the chunkdata, there may be reasons for wanting the list of existing chunks, without wanting the chunks themselves.

ST-DDT commented 9 years ago

Thats why there are two methods each. One for the chunks and one only for the data.

gabizou commented 9 years ago

@SpongePowered/developers I believe this issue has been discussed well enough and a proper addition to the API was made with #272, specifically with ChunkIterator.

Correct me if I'm wrong.

Lunaphied commented 9 years ago

This PR has morphed significantly from it's initial conception. Also @gabizou ChunkIterator doesn't include support for the data only asynchronous iterator. Similarly it's not asynchronous and there isn't an asynchronous read-only option.

gabizou commented 9 years ago

Similarly it's not asynchronous and there isn't an asynchronous read-only option.

Why is it not asynchronous? What prevents the data from being read asynchronously? Constructing a DataView of chunk data has no relation to the actual server thread, let alone needing to really utilize the main thread. Granted calling the iterator while on the main thread will hold up the main thread to read the data, it still is perfectly viable to read the data from an asynchronous thread. The DataView is specifically not an instance of the data that is live for this very purpose.

Unless I'm misunderstanding this completely, the ChunkIterator can be used asynchronously for all intents and purposes related to reading data. Writing data is an entirely different story.

Lunaphied commented 9 years ago

However the API doesn't specify an asynchronous relationship, which is different than being possibly asynchronous. The specifications are similar to asynchronous conditions, but not guaranteed asynchronous.

gabizou commented 9 years ago

@modwizcode I've changed up the nature of ChunkIterator in #420, It aims to achieve much of what this issue has requested.

maxov commented 9 years ago

Now that #420 has merged(with added asynchronicity), I'll close this.

SpongePowered / SpongeAPI

[Suggestion] List of all generated chunks #256