CEMeNT-PSAAP / MCDC

MC/DC: Monte Carlo Dynamic Code
https://mcdc.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
20 stars 20 forks source link

GPU Interop #195

Closed braxtoncuneo closed 4 months ago

braxtoncuneo commented 4 months ago

Incorporates GPU interop via Harmonize, achieved through the following changes:

braxtoncuneo commented 4 months ago

There are a couple of issues that I would like to bring up before this is merged:

braxtoncuneo commented 4 months ago

Out of curiosity, regarding data alignment in type_.py, what are the types that particularly need the alignment?

All types need to be aligned, but whether or not something needs to be done to align them is context dependent. For the sake of alignment, structs need to be laid out assuming that the base address is divisible by the largest alignment size we care about. From there, fields are laid out in sequence, in the order they appear in the list, laying out sub-structs recursively. By default, Numba packs all fields next to each other, with no additional alignment considerations.

An example of a case where padding is needed is an 8-byte field (A), followed by a 1-byte field (B), followed by an 8-byte field (C).

This is how Numba would lay it out in memory (each letter representing a byte): AAAAAAAABCCCCCCCC

This seems sensible, but then you notice that A and C cannot both be aligned to a base address divisible by 8. To ensure both are aligned, some padding must be provided: AAAAAAAAB.......CCCCCCCC

Padding like this (though of differing amounts) would be necessary for any combination of A and C with sizes greater than 1 byte.

Technically speaking 1-byte types could be considered as types that "don't care about alignment", but it would be more accurate to say that it is impossible to make them not aligned, since all addresses are divisible by 1.

braxtoncuneo commented 4 months ago

No rush. Just wanted to unblock it from my end, since Kayla gave the go-ahead and nobody in the slack seemed opposed.

ilhamv commented 4 months ago

It looks like some of the components of the global state struct have mismatching dimensions compared to the input deck's data. This is not an issue with GPU, but something pre-existing. Still, I wanted to bring it up since I've added in code that checks and reports some of these mismatches.

That is because some information, including how it is presented/structured, is relevant for the input interface, while others are relevant only in the simulation global state, and vice versa. The reconciliation particularly happens in prepare() in main.py.

ilhamv commented 4 months ago

Do we set a GitHub workflow to do the GPU regression test in this PR? If not, or not possible, what is the plan? @braxtoncuneo @jpmorgan98

jpmorgan98 commented 4 months ago

I am setting up a github local runner on the CEMeNT dev machine we have at OSU. I might need admin privileges to get the host installed which will slow me down a bit but I don't think OSU COE IT should have too much of a problem helping me out. From there I think we can run whatever we want (CPU and Nvidia GPU runs) directly from the Github page.

I was thinking we could do some light performance testing per PR to make sure that a given PR wont slow down the code for GPUs or CPUs too much

jpmorgan98 commented 4 months ago

I am setting up a github local runner on the CEMeNT dev machine we have at OSU. I might need admin privileges to get the host installed which will slow me down a bit but I don't think OSU COE IT should have too much of a problem helping me out. From there I think we can run whatever we want (CPU and Nvidia GPU runs) directly from the Github page.

I was thinking we could do some light performance testing per PR to make sure that a given PR wont slow down the code for GPUs or CPUs too much

Ok I got the runner up and going I am going to try and get harmonize to auto configure with MC/DC via the install script, add the proper runner then add a commit to this PR

braxtoncuneo commented 4 months ago

Strangely, Ilham's latest commit is failing in the CEMeNT repo but passing in the fork. I'm going to run the regression tests locally to try to figure out a cause.

jpmorgan98 commented 4 months ago

GPU regression testing is waiting on #196 to be resolved on the OSU CI machine. We should be able to run regression tests manually and locally for up coming PRs @ilhamv