`get_blt_slices` is slow

While profiling the LST-binner over 40 days of H6C data, I found that out of the ~20k total seconds taken, almost 5k (1.5 hours!) were spent in the get_blt_slices function. This seems somewhat unnecessary. For reference, here's the output from line-profiler for the function:

Total time: 4757.35 s
File: /lustre/aoc/projects/hera/heramgr/anaconda3/envs/h6c/lib/python3.10/site-packages/hera_cal/io.py
Function: get_blt_slices at line 422

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   422                                           def get_blt_slices(uvo, tried_to_reorder=False):
   423                                               '''For a pyuvdata-style UV object, get the mapping from antenna pair to blt slice.
   424                                               If the UV object does not have regular spacing of baselines in its baseline-times,
   425                                               this function will try to reorder it using UVData.reorder_blts() to see if that helps.
   426                                           
   427                                               Arguments:
   428                                                   uvo: a "UV-Object" like UVData or baseline-type UVFlag. Blts may get re-ordered internally.
   429                                                   tried_to_reorder: used internally to prevent infinite recursion
   430                                           
   431                                               Returns:
   432                                                   blt_slices: dictionary mapping anntenna pair tuples to baseline-time slice objects
   433                                               '''
   434      4799      10367.0      2.2      0.0      blt_slices = {}
   435  42100895  230656109.0      5.5      4.8      for ant1, ant2 in uvo.get_antpairs():
   436  42096096 3186135020.0     75.7     67.0          indices = uvo.antpair2ind(ant1, ant2)
   437  42096096   77943582.0      1.9      1.6          if len(indices) == 1:  # only one blt matches
   438    617976    3585403.0      5.8      0.1              blt_slices[(ant1, ant2)] = slice(indices[0], indices[0] + 1, uvo.Nblts)
   439  41478120  986563753.0     23.8     20.7          elif not (len(set(np.ediff1d(indices))) == 1):  # checks if the consecutive differences are all the same
   440                                                       if not tried_to_reorder:
   441                                                           uvo.reorder_blts(order='time')
   442                                                           return get_blt_slices(uvo, tried_to_reorder=True)
   443                                                       else:
   444                                                           raise NotImplementedError('UVData objects with non-regular spacing of '
   445                                                                                     'baselines in its baseline-times are not supported.')
   446                                                   else:
   447  41478120  271685967.0      6.6      5.7              blt_slices[(ant1, ant2)] = slice(indices[0], indices[-1] + 1, indices[1] - indices[0])
   448      4799     768416.0    160.1      0.0      return blt_slices

A lot of the time is taken up with finding the indices for each antpair. I get that this is sometimes necessary, because in general a UVData can have blt's in any order. But in fact for HERA data it is unnecessary because blt's always go time-first, antenna-second. If we can find a way to quickly determine (or maybe allow "assuming") that we can use this info, it would be a significant speed up.

HERA-Team / hera_cal

`get_blt_slices` is slow #858