Reduce lock contention and the number of dispatch threads.

foxostro commented 12 years ago

At the moment, there is a large amount of lock contention in the terrain code (GSChunkStore, GSChunk*Data, &c) that results in GCD creating a very large number of threads. If this contention can be reduced then the number of threads used by the game will drop from the current 60+ on Lion (Yuck). It might even be possible to remove the hack of using the background priority dispatch queue for decidedly non-background work. (this queue is scheduled less favorably and so won't interfere with the main thread)

The idea I have on how to do this is to, as much as possible, prefer to grab locks using tryLockForReading instead of lockForReading. If some task cannot be completed immediately then a flag on the resource can be set and we can try again later. For example, when a chunk needs to have its geometry updated, mark the chunk as dirty and try to do the update immediately. If it cannot be done immediately then try again when the chunk store performs its update tick.

File I/O, such as loading chunks from disk, should use the GCD APIs for file I/O so as to avoid causing more threads to be spun up.

foxostro commented 12 years ago

I've made some changes to the project in the branch "less_lock_contention" to begin to address the problem of blocking the main thread to grab locks, as well as the problem of lock contention on background threads.

From Instruments time-profiler traces, it looks like lock contention is a serious problem for frame rate that I haven't done much to address until now. The main thread and the CVDisplayLink thread can be blocked for a significant portion of their running time waiting for locks. For example, the lock in activeRegion.

ad07221fcbd7da185d7495bfa19be2e0eaf7d5db: -enumerateActiveChunkWithBlock: doesn't hold the lock for the entire enumeration anymore. It doesn't have to anymore.

ff3fb1dd9cd313c995eb73b8909726b1685777cc: Chunk visibility calculations moved to the CVDisplayLink thread. The main thread uses a flag to notify that thread when visibility is out of date.

Many changes, most recently 648b74f99307f67df95742244cd8560cb0ccfa43: Chunk geometry never blocks to wait for locks. Instead, it tries to take the locks and bails immediately if it can't. During GSChunkStore's update tick, we make another attempt to update geometry in need of an update.

Next step: Currently, 77% of time on background threads is spent waiting to grab locks on voxel data for sunlight generation. Sunlight generation should be changed so that it never blocks to wait for locks. Give it the same treatment as chunk geometry generation, basically.

foxostro commented 12 years ago

NSCache/libcache can only have one reader/writer at a time (I'm pretty sure). Replace use of NSCache in GSChunkStore with a custom data structure that implements a hash table with lock striping across the table. This way, it can have multiple concurrent accesses.

libcache dumps its contents on memory pressure and there is no publicly accessible API for implementing this behavior in a custom data structure. But losing this might be an acceptable trade off for reduced lock contention.

foxostro commented 12 years ago

This is basically complete. The branch has been merged into master as of 3820c558c7f89d7bb4586a77ba21d8902e3a336b.

foxostro / GutsyStorm

Reduce lock contention and the number of dispatch threads. #57