chrism0dwk / covid19uk

MIT License
11 stars 10 forks source link

Rewrite Multinomial as hand rolled iterated binomial. #2

Closed csuter closed 4 years ago

csuter commented 4 years ago

This brings the runtime for 195 days down to about 60sec. It's still really slow in XLA mode; not sure why. We can follow up with some performance analysis on our side.

I am reasonably confident that what I wrote is correct, but we should verify.

csuter commented 4 years ago

Your take was also very reasonable! We could benchmark them against each other, but I'd be surprised to see a big perf diff. I'm still worried about the impact of XLA compilation on runtime. We'll keep pursuing this on our end.

chrism0dwk commented 4 years ago

For some weird reason the GPU on my laptop still isn't playing ball, but FWIW on the CPU I get a 74s runtime. However, the CPU usage is about 103% -- I would have expected a bit more, particularly in terms of parallelisation of the Kroneckers in the rate calculation and batched expm and sampling functions.

csuter commented 4 years ago

I was able to run on GPU (with XLA turned off) and got around the same runtime as on CPU. I didn't peek at utilization though.