FLAMEGPU / FLAMEGPU2

FLAME GPU 2 is a GPU accelerated agent based modelling framework for CUDA C++ and Python
https://flamegpu.com
MIT License
99 stars 19 forks source link

Migrate to Jitify2 #1150

Open Robadob opened 7 months ago

Robadob commented 7 months ago
Robadob commented 7 months ago

Update regarding header pre-loading with Jitify2/CUDA 12.3

Windows/CUDA 12.0

No preload
Millis: 6822.000000
Millis: 6853.000000

Preloading FLAMEGPU headers
Millis: 4045.000000
Millis: 4277.000000

Preload FLAMEGPU + CUDA headers
Millis: 1296.000000
Millis: 1667.000000

Linux/CUDA 12.3

Jitify 2 from scratch (Waimu)
Millis: 25318.000000
Millis: 24143.000000

Preload FLAMEGPU + CUDA headers
Millis: 1376.000000
Millis: 2218.000000

CUDA 12.0 has ~30 CUDA headers to preload. CUDA 12.3 has ~257 CUDA headers to preload. (List contains some dupes)

Not clear whether we would want to generalise this code, to better handle different CUDA versions, because we could be potentially needing to update it with each CUDA update.

Edit: Removed from-cache times, latest commit has these matching Jitify1.

Robadob commented 7 months ago

Current issue holding back the Jitify2 preprocesor branch is that it expects our flamegpu headers to be included as system header <> rather than " ". Waiting to here back from the dev (Ben) before I try to correct that on our side.

Robadob commented 7 months ago

Did three full test runs last night, all passed, however in those cases the cmake jitify dependency was pointing at the preprocess branch. Not currently using that here as it causes all windows CI to fail with WError.

Linux/CUDA12.3/Seatbelts ON/GLM ON/Release Linux/CUDA12.3/Seatbelts OFF/GLM ON/Release Windows/CUDA12.0/Seatbelts ON/GLM OFF/Debug

In release builds kernels are taking ~1 second to compile each. As Jitify is now doing the pre-processing, this is closer to 2.5 seconds under Debug builds.