Closed constinit closed 2 years ago
Just as an FYI: this is running at 2.8us per iteration on my machine (10 core M1). Before you started working on this, it was running around 7-8us per iteration, so that's almost 3x faster which is pretty sweet for some code that was already quite fast. NICE 🎉
We replace the current slogdet call (which requires pivoting) with a GPU-friendly, numerically stable, and faster implementation that uses the matrix determinant lemma: https://en.wikipedia.org/wiki/Matrix_determinant_lemma
This further reduces runtime from ~10us to ~7us.