Closed jtramm closed 2 years ago
Performance delta on the A100 for this PR is as follows:
Old | This PR | |
---|---|---|
HM Large Inactive | 356 | 361 |
SMR Active | 161 | 160 |
However, it is notable that the differences can be much more dramatic on other GPU architectures (and perhaps even between compilers for A100 down the line) due to how various OpenMP runtimes handle the mapping of millions of pointers.
This PR serializes materials into global arrays. The global arrays are implemented in the form of a new class,
vector2d<T>
. See the writeup of PR #8 for more info.