Closed yut23 closed 2 months ago
Performance comparisons, using test_screening_templated
with aprox21
, best of 10 runs
CPU, with derivatives (n_cell=128, loops=1):
- screen5: 3.43s -> 3.37s (-0.05s, -2%)
- chugunov2007: 9.47s -> 9.34s (-0.13s, -1%)
- chugunov2009: 18.53s -> 16.58s (-1.95s, -11%)
- chabrier1998: 5.70s -> 5.15s (-0.55s, -10%)
with q.c.: 6.49s -> 5.82s (-0.67s, -10%)
CPU, without derivatives (n_cell=128, loops=1):
- screen5: 2.71s -> 2.83s (+0.12s, +4%)
- chugunov2007: 8.47s -> 8.50s (+0.03s, +0%)
- chugunov2009: 16.41s -> 15.02s (-1.39s, -8%)
- chabrier1998: 4.43s -> 4.44s (+0.02s, +0%)
with q.c.: 4.63s -> 4.59s (-0.04s, -1%)
CUDA, with derivatives (CUDA_LTO=TRUE, n_cell=64, loops=100):
- screen5: 1.70s -> 1.77s (+0.07s, +4%)
- chugunov2007: 3.33s -> 2.62s (-0.71s, -21%)
- chugunov2009: 6.98s -> 7.84s (+0.86s, +12%)
- chabrier1998: 3.51s -> 3.00s (-0.51s, -15%)
with q.c.: 4.11s -> 3.42s (-0.69s, -17%)
CUDA, without derivatives (CUDA_LTO=TRUE, n_cell=64, loops=100):
- screen5: 1.36s -> 1.38s (+0.02s, +1%)
- chugunov2007: 2.17s -> 2.15s (-0.02s, -1%)
- chugunov2009: 5.21s -> 5.21s (+0.00s, +0%)
- chabrier1998: 2.12s -> 2.12s (-0.00s, -0%)
with q.c.: 2.34s -> 2.34s (+0.00s, +0%)
Needs #1593
all the Jacobian diffs in the test suite are roundoff level, so this seems to be working well.
This PR gives the same results for
test_screening
to within roundoff, and the performance is about the same or slightly better. It also adds a page to the docs about how to use the autodiff library.One notable change is that the templated networks don't calculate the derivative terms when screening is called from
RHS::rhs()
(this is also how the pynucastro networks behave). Previously, they would be calculated unnecessarily ifintegrator.jacobian
was set to 1.