Reduce precision: While we use double precision coordinates, it may be worth it to use single precision floats for our BVH for smaller memory footprint and better performance. As long as we make sure it is overapproximating, i.e. round up/down for max/min, and maybe account for epsilon, it should be safe and will give us better performance for the general case. For the rare cases that exceed single precision floating point range, we can just always treat that as colliding, I don't think users use that anyway.
Maybe try AVX for bounding box overlap check. This may improve performance.
Just some ideas for optimization: