buildaworldnet / IrrlichtBAW

Build A World fork of Irrlicht
http://www.buildaworld.net
Apache License 2.0
122 stars 28 forks source link

Make IrrBAW "Fibreable" a-la Naughty Dog #214

Open devshgraphicsprogramming opened 5 years ago

devshgraphicsprogramming commented 5 years ago

We want to achieve this, so that we can control the pre-emption of threads and affinity http://twvideo01.ubm-us.net/o1/vault/gdc2015/presentations/Gyrling_Christian_Parallelizing_The_Naughty.pdf

NOTE: Engine should still work and be compatible with non-fibered/jobbed execution.

However we don't want to take responsibility for the job-scheduling. We can build a default scheduler for the user to build-off, but should make it easily replaceable (like IAssetLoaderOverride).

We shall provide fiber-safe replacement for std::mutex and std::condition_variable. The library should be able to change between normal C++11 threading and fiber-threading via a compile flag ( switching out std::mutex, std::confition_variable for alternates).

We should look into and how they achieve the stack allocation, register saving etc. https://www.boost.org/doc/libs/1_69_0/libs/context/doc/html/index.html

For some ideas for synch primitives: https://www.boost.org/doc/libs/1_69_0/libs/fiber/doc/html/index.html https://www.boost.org/doc/libs/1_69_0/libs/coroutine2/doc/html/index.html https://www.boost.org/doc/libs/1_69_0/doc/html/lockfree.html

manhnt9 commented 5 years ago

ASIO has this capability that I used to create a thread-pool for jobs scheduling. Just for you to know how it works to have more design ideas, I don't think you're interested in using asio in IrrlichtBAW though :smiley:

from any where in the code
  submit lamdas (or any function objects) - call these jobs

spawn multiple threads
  call asio's run function
  jobs are automatically executed on these threads
  jobs which are wrapped in something called strand won't be called in parallel
  jobs can have order dependency too

So you can consider to have the same I/O service objects for scheduling in a job example, maybe.

I've learnt a bit about fiber but haven't coded with it yet. Will do soon since I'm also interested in it. Probably gonna use Boost.Fiber in my engine.

devshgraphicsprogramming commented 5 years ago

Jobs are not pwoerful enough, I need an std::mutex and std::condition_variable replacement that can "pause" a job (save its whole stack) and "resume" it at a later time.

I considered Boost.Fiber but I don't like the fact it schedules your fibers for you.

I want fiber scheduling to live outside of IrrBAW and boost, ergo the reason for using Boost::Context.

manhnt9 commented 5 years ago

How do you think about Coroutine?

Well I think I forgot, I probably want Intel TBB more than Boost.Fiber for my game engine.

devshgraphicsprogramming commented 5 years ago

Coroutine is still stackless... i.e. coroutine is a generalized routine (routine = function call).

manhnt9 commented 4 years ago

I'm also considering this: https://github.com/dougbinks/enkiTS

devshgraphicsprogramming commented 4 years ago

Mutexes, Barriers, wait for all, async file I/O and custom schedulers https://github.com/lewissbaker/cppcoro

devshgraphicsprogramming commented 4 years ago

http://www.1024cores.net/home/lock-free-algorithms/tricks/fibers

We should probably benchmark using the case of filtering a 4096^2 or 8192^2 image's mip-maps on a CPU with the following techniques:

We should gather data about the performance characteristics of the following hardware:

We should gather a performance chart of time-to-finish vs. task size for every "concurrency method"

devshgraphicsprogramming commented 4 years ago

Unlike your traditional coroutine vs. fiber benchmark, we'll be focusing on tasks with a duration in microseconds, not nanoseconds

devshgraphicsprogramming commented 4 years ago

Really want/need this in core C++23 www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0876r9.pdf

devshgraphicsprogramming commented 4 years ago

Asynchronous I/O requires an IFile implementation which implements virtual memory caching of the contents (with pages aligned to and sized in 4096 increments) OR to read the whole file in at once (at least cache it in contiguous memory)

Probably an IFilePool would be useful to amortize cache costs.

devshgraphicsprogramming commented 4 years ago

https://blog.libtorrent.org/2012/10/asynchronous-disk-io/