Keep things as is and apply a manual shift to the result
I tested the existing implementation with random data and the "winner" was able to survive with a shift of up to 72 (seconds). However this doesn't seem very robust, and will definitely fail in some edge cases.
Check distances across sliding window and across offsets. The offsets can come from the master sample.
As stated, this is a lot of computation. Hopefully there's some way to optimize it...
I can see two approaches for dealing with this