Open sphinxc0re opened 9 years ago
This would help a lot on the Raspberry Pi 2. (4 cores)
According to @madmaxoft the generator is already async: https://github.com/cuberite/cuberite/issues/2324#issuecomment-119897046
I can confirm this. The World TPS goes really down at chunk loading.
Maybe the cause is then something in the tick thread waiting for the generator to complete a chunk.
@NiLSPACE I think @madmaxoft meant that the Chunk generation is async to the world interaction
@LO1ZB is working on a multi-threaded generator right now.
I said that I'm trying to make it multithreaded, I dono have much experience with multithreading (yet).
The only big decision is whether to make one instance support multiple threads, or have a separate instance for each thread; also whether to share the caches or not.
Why would we share the caches? That shouldn't help allot since the seeds are most likely different right?
I have one question. Does anyone have any evidence that the world generation computations are the bottleneck? If so can you post it. Otherwise I think we need to double check what is causing the problem fist, because it could easily be a queue or something else non-intuitive.
I'm profiling and let MCServer generate 251001 Chunks. Will most likly take a while. ;)
Exportet profile data (xml and csv) https://mega.nz/#!khx1TIAa!GuW5d8m5c-IE9V4mWtCl6o1If-6b5lraSxzCDRmv4TY
Firstly there's some low hanging fruit for optimization, Noise code and cChunkGenBiomal::FillColumnPattern for example. Secondly this profile doesn't look like one that would benefit from a multi-threaded chunk generator on anything less than a six core machine, as very little time is spent in the tick code.
Would it make the performance on < 6 cores machines worse?
Quite possibly, because it would add locking overhead, depending on how you did it.
Is it possibile to store some cGenerator in an std::vector? If yes, how do I create them?
It would be possible to store multiple in a std::vector, but I do not see the benefit. Here's where they are created: https://github.com/cuberite/cuberite/blob/master/src/Generating/ChunkGenerator.cpp#L72 https://github.com/cuberite/cuberite/blob/master/src/Generating/ChunkGenerator.cpp#L80
If you need a function multithreaded, may I suggest std::async()
?
Can I suggest not, as it really does not fit with how a lot of the code base works. If we need to execute the generators on a shared pool, then we will need to create our own thread pool.
Isn't creating and maintaining thread pool code a bit too drastic (and ambitious?)
Yes. That's why I think multithreaded generators are too drastic. Optimisation is a better strategy.
Not necessarily. The generators are more or less enclosed into a single thread and are thread-safe. It would really only be a matter of changing cChunkGenerator::m_Generator
into a vector of generators and managing the request queue / threadpool for them.
The interesting part would be allowing the generators to share the caches. Since the generators have all the same settings, they generate identical data and therefore caches contain the same data; sharing them is most logical. But the caches would need to be made multi-thread-safe and the generators would need some mechanisms to set up the cache sharing when they are instantiated.
That adds complexity. And multithreading is a blunt instrument given we focus on low end machines with four or fewer cores.
Four cores would mean almost four times the generator performance, I believe it's worth it. The complexity isn't that bad.
Four times if we can pump data through the queues and there is nothing else running on the server. Can the queues handle the contention.
Most private servers are running more than one process they care about. My RPi2 runs Cuberite and a Mumble server. Hope they'll still behave well together when they use more cores, because ... well, I'm perfectly fine with Cuberite using 3 cores and leaving one for Murmur :).
On Sat, Jul 25, 2015 at 11:50 AM, worktycho notifications@github.com wrote:
Four times if we can pump data through the queues and there is nothing else running on the server. Can the queues handle the contention.
— Reply to this email directly or view it on GitHub https://github.com/cuberite/cuberite/issues/2320#issuecomment-124829405.
@jammet most OS'es are able to force a process to use only specific cores.
I landed here searching if cuberite is taking advantage of multi-core cpus. This issue talks about enabling this in chunk generator and I wonder if there are other parts of the code with such feature.
The chunk generator is on a dedicated thread and is done asynchronously from the tick thread. This is partly about maintaining responsiveness. This thread is mainly about whether adding the ability to use more than one thread to generate for a single world would be useful. The problem is that it is only useful on larger worlds running on high power servers, and could be actively harmful on machines with a low number of cores (<4). Though given changes in the CPU market recently we might want to move towards considering 6 core processors more common.
OK! Thanks for the explanation :+1:
Was this ever implemented? It seems easy enough since all world generation is generating the same exact thing (effectively) you could add a setting in settings.ini saying how many threads and then at startup make a threadpool which just waits for work. I don't know enough about Cuberites internal design to know if this is actually feasible though, is it?
It's not that easy, because the generator objects do have some local data that they access and modify while generating the chunk, so either each thread gets a copy of the data, or the threads need to synchronize their access. Not to mention that there are caches for the generated biomes and shape, which make maximum sense to be shared, yet they are NOT threadsafe.
Ah I see, still I see some areas that could be threaded out. I'll see if I can get some implementation working and if it'd be any faster
Coming to this a few years later, my belief nowadays is that multithreading the world generator would actually lower the performance in most, if not all, cases. My main argument for this belief is CPU cache: the generators are very cache-hungry, grinding over several megabytes over and over again. If multiple such generators were run in parallel, the CPU cache would get depleted very soon and everything would be very slow. However, there's no real proof one way or the other, so I suggest before going to the lengths of making the generators multithreaded to actually measure. Somehow we need to measure perf when multiple generators are running in parallel.
I wrote a POC plugin a while ago that spread out world generation over multiple worlds. Since every world has it's own thread for world generation it allowed chunks to be generated over multiple threads. It has some inherit flaws of course like:
There currently is also an issue where lighting at chunk borders aren't correct, that could probably be fixed by scheduling a relighting of the chunk. Entities are currently not transferred either, so you won't see any mobs until they spawn naturally. On low-powered devices like a smartphone the world did generate more quickly.
I'm not sure if the slight performance increase justifies the increase in complexity though.
Here is Cuberite running without and then with the plugin enabled on my phone:
https://github.com/cuberite/cuberite/assets/1160867/0fd9328c-5bdf-4313-944c-8b862bafca70
When a huge world is explored by players, the server is acting slow