Note on min_distance thresholds - Each document is assigned to the closest candidate cluster. When we set min_distance we will filter out clusters that dont meet that requirement without going into the individual clusters looking for matches. This means that we are likely to miss some good matches along the way since we wont investigate clusters that just miss the cutoff. A (planned) patch for this behavior would be to also search clusters that 'just' miss this cutoff.
From README: