astro-group-bristol / Gradus.jl

Extensible spacetime agnostic general relativistic ray-tracing (GRRT).
https://astro-group-bristol.github.io/Gradus.jl/dev/
GNU General Public License v3.0
18 stars 2 forks source link

Excessive allocation and garbage collection for high-resolution transfer functions #82

Closed fjebaker closed 1 year ago

fjebaker commented 1 year ago

With regard to GC, I noticed that lagtransfer can spend a lot of time no garbage collection. 3533.139993 seconds (292.44 M allocations: 87.741 GiB, 83.50% gc time, 0.02% compilation time).

-- @phajy in #30

This is an insane amount of time lost to GC and needs to urgently be addressed.

fjebaker commented 1 year ago

Code to reproduce:

using Gradus
using StaticArrays

m = KerrMetric(M = 1.0, a = 1.0)
u = @SVector [0.0, 1e6, deg2rad(60), 0.0]
d = GeometricThinDisc(Gradus.isco(m), 500.0, deg2rad(90.0))

tf = @time lagtransfer(m, u, d, callback = domain_upper_hemisphere())

Output:

 54.142732 seconds (17.62 M allocations: 6.310 GiB, 13.49% gc time)
LagTransferFunction for KerrMetric{Float64} 
  . observer position      
      [0.0, 1.0e6, 1.0471975511965976, 0.0]
  . observer to disc photon count : 567423
  . source to disc photon count   : 5849
  Total memory: 87.430 MiB
fjebaker commented 1 year ago

Single geodesic:

  28.435 μs (22 allocations: 7.05 KiB)

100,000 with multi-threading:

  71.955 ms (2200394 allocations: 707.29 MiB)
fjebaker commented 1 year ago

Using lagtransfer with Nr=5000, Nθ = 800.

M1 (8 threads):

348.768363 seconds (105.84 M allocations: 31.451 GiB, 49.01% gc time, 0.18% compilation time: 10% of which was recompilation)

Group Server (64 threads):

128.999013 seconds (104.98 M allocations: 31.470 GiB, 71.88% gc time)

For Nr = 800, Nθ = 800:

Group Server (64 threads):

47.846382 seconds (17.62 M allocations: 5.138 GiB, 86.53% gc time)

fjebaker commented 1 year ago

Tracing with or without the disc:

@btime tracegeodesics($m, $u1, $v1, $t_span; save_on = false)
#   30.042 μs (37 allocations: 8.84 KiB)
@btime tracegeodesics($m, $u1, $v1 $d, $t_span; save_on = false)
#   76.083 μs (27 allocations: 8.28 KiB)

This is due to the ContinuousCallback that's being used.

fjebaker commented 1 year ago

After #85. M1 (8 threads):

213.144266 seconds (44.83 M allocations: 10.079 GiB, 0.65% gc time)