Make generators multi-threaded

sphinxc0re commented 9 years ago

When a huge world is explored by players, the server is acting slow

jammet commented 9 years ago

This would help a lot on the Raspberry Pi 2. (4 cores)

NiLSPACE commented 9 years ago

According to @madmaxoft the generator is already async: https://github.com/cuberite/cuberite/issues/2324#issuecomment-119897046

Howaner commented 9 years ago

I can confirm this. The World TPS goes really down at chunk loading.

tigerw commented 9 years ago

Maybe the cause is then something in the tick thread waiting for the generator to complete a chunk.

sphinxc0re commented 9 years ago

@NiLSPACE I think @madmaxoft meant that the Chunk generation is async to the world interaction

SamJBarney commented 9 years ago

@LO1ZB is working on a multi-threaded generator right now.

LO1ZB commented 9 years ago

I said that I'm trying to make it multithreaded, I dono have much experience with multithreading (yet).

madmaxoft commented 9 years ago

The only big decision is whether to make one instance support multiple threads, or have a separate instance for each thread; also whether to share the caches or not.

NiLSPACE commented 9 years ago

Why would we share the caches? That shouldn't help allot since the seeds are most likely different right?

worktycho commented 9 years ago

I have one question. Does anyone have any evidence that the world generation computations are the bottleneck? If so can you post it. Otherwise I think we need to double check what is causing the problem fist, because it could easily be a queue or something else non-intuitive.

LO1ZB commented 9 years ago

I'm profiling and let MCServer generate 251001 Chunks. Will most likly take a while. ;)

LO1ZB commented 9 years ago

Exportet profile data (xml and csv) https://mega.nz/#!khx1TIAa!GuW5d8m5c-IE9V4mWtCl6o1If-6b5lraSxzCDRmv4TY

worktycho commented 9 years ago

Firstly there's some low hanging fruit for optimization, Noise code and cChunkGenBiomal::FillColumnPattern for example. Secondly this profile doesn't look like one that would benefit from a multi-threaded chunk generator on anything less than a six core machine, as very little time is spent in the tick code.

LO1ZB commented 9 years ago

Would it make the performance on < 6 cores machines worse?

worktycho commented 9 years ago

Quite possibly, because it would add locking overhead, depending on how you did it.

LO1ZB commented 9 years ago

Is it possibile to store some cGenerator in an std::vector? If yes, how do I create them?

SamJBarney commented 9 years ago

It would be possible to store multiple in a std::vector, but I do not see the benefit. Here's where they are created: https://github.com/cuberite/cuberite/blob/master/src/Generating/ChunkGenerator.cpp#L72 https://github.com/cuberite/cuberite/blob/master/src/Generating/ChunkGenerator.cpp#L80

tigerw commented 9 years ago

If you need a function multithreaded, may I suggest std::async()?

worktycho commented 9 years ago

Can I suggest not, as it really does not fit with how a lot of the code base works. If we need to execute the generators on a shared pool, then we will need to create our own thread pool.

tigerw commented 9 years ago

Isn't creating and maintaining thread pool code a bit too drastic (and ambitious?)

worktycho commented 9 years ago

Yes. That's why I think multithreaded generators are too drastic. Optimisation is a better strategy.

madmaxoft commented 9 years ago

Not necessarily. The generators are more or less enclosed into a single thread and are thread-safe. It would really only be a matter of changing cChunkGenerator::m_Generator into a vector of generators and managing the request queue / threadpool for them.

The interesting part would be allowing the generators to share the caches. Since the generators have all the same settings, they generate identical data and therefore caches contain the same data; sharing them is most logical. But the caches would need to be made multi-thread-safe and the generators would need some mechanisms to set up the cache sharing when they are instantiated.

worktycho commented 9 years ago

That adds complexity. And multithreading is a blunt instrument given we focus on low end machines with four or fewer cores.

madmaxoft commented 9 years ago

Four cores would mean almost four times the generator performance, I believe it's worth it. The complexity isn't that bad.

worktycho commented 9 years ago

Four times if we can pump data through the queues and there is nothing else running on the server. Can the queues handle the contention.

jammet commented 9 years ago

Most private servers are running more than one process they care about. My RPi2 runs Cuberite and a Mumble server. Hope they'll still behave well together when they use more cores, because ... well, I'm perfectly fine with Cuberite using 3 cores and leaving one for Murmur :).

On Sat, Jul 25, 2015 at 11:50 AM, worktycho notifications@github.com wrote:

Four times if we can pump data through the queues and there is nothing else running on the server. Can the queues handle the contention.

— Reply to this email directly or view it on GitHub https://github.com/cuberite/cuberite/issues/2320#issuecomment-124829405.

LO1ZB commented 9 years ago

@jammet most OS'es are able to force a process to use only specific cores.

spekdrum commented 7 years ago

I landed here searching if cuberite is taking advantage of multi-core cpus. This issue talks about enabling this in chunk generator and I wonder if there are other parts of the code with such feature.

worktycho commented 7 years ago

The chunk generator is on a dedicated thread and is done asynchronously from the tick thread. This is partly about maintaining responsiveness. This thread is mainly about whether adding the ability to use more than one thread to generate for a single world would be useful. The problem is that it is only useful on larger worlds running on high power servers, and could be actively harmful on machines with a low number of cores (<4). Though given changes in the CPU market recently we might want to move towards considering 6 core processors more common.

spekdrum commented 7 years ago

OK! Thanks for the explanation :+1:

callowaysutton commented 4 years ago

Was this ever implemented? It seems easy enough since all world generation is generating the same exact thing (effectively) you could add a setting in settings.ini saying how many threads and then at startup make a threadpool which just waits for work. I don't know enough about Cuberites internal design to know if this is actually feasible though, is it?

madmaxoft commented 4 years ago

It's not that easy, because the generator objects do have some local data that they access and modify while generating the chunk, so either each thread gets a copy of the data, or the threads need to synchronize their access. Not to mention that there are caches for the generated biomes and shape, which make maximum sense to be shared, yet they are NOT threadsafe.

callowaysutton commented 4 years ago

Ah I see, still I see some areas that could be threaded out. I'll see if I can get some implementation working and if it'd be any faster

madmaxoft commented 1 year ago

Coming to this a few years later, my belief nowadays is that multithreading the world generator would actually lower the performance in most, if not all, cases. My main argument for this belief is CPU cache: the generators are very cache-hungry, grinding over several megabytes over and over again. If multiple such generators were run in parallel, the CPU cache would get depleted very soon and everything would be very slow. However, there's no real proof one way or the other, so I suggest before going to the lengths of making the generators multithreaded to actually measure. Somehow we need to measure perf when multiple generators are running in parallel.

NiLSPACE commented 1 year ago

I wrote a POC plugin a while ago that spread out world generation over multiple worlds. Since every world has it's own thread for world generation it allowed chunks to be generated over multiple threads. It has some inherit flaws of course like:

Multiple worlds mean there are also multiple ticking threads
Lots of cache overlap since multiple generators are generating the same world

There currently is also an issue where lighting at chunk borders aren't correct, that could probably be fixed by scheduling a relighting of the chunk. Entities are currently not transferred either, so you won't see any mobs until they spawn naturally. On low-powered devices like a smartphone the world did generate more quickly.

I'm not sure if the slight performance increase justifies the increase in complexity though.

Here is Cuberite running without and then with the plugin enabled on my phone:

https://github.com/cuberite/cuberite/assets/1160867/0fd9328c-5bdf-4313-944c-8b862bafca70

cuberite / cuberite

Make generators multi-threaded #2320