Iteration could compute masks more efficiently

In the initial SWIFTGalaxies iterator class masks are calculated for each galaxy here:

https://github.com/SWIFTSIM/swiftgalaxy/blob/2e9e4779221840e2c2f87dcdcfbcba4dee1876a7/swiftgalaxy/iterator.py#L310-L312

within the loop over galaxies. This means that for each galaxy we evaluate:

https://github.com/SWIFTSIM/swiftgalaxy/blob/2e9e4779221840e2c2f87dcdcfbcba4dee1876a7/swiftgalaxy/halo_catalogues.py#L274-L289

The == operation is fairly expensive. Perhaps the masks can be pre-computed for all target galaxies in a region just after the data preloading loop:

https://github.com/SWIFTSIM/swiftgalaxy/blob/2e9e4779221840e2c2f87dcdcfbcba4dee1876a7/swiftgalaxy/iterator.py#L304

Here, instead of looping over the galaxies with == and finding the matches in group_nr_bound, a more efficient solution needs to be found. The inputs are:

the target galaxies in the region (contained in solution["region_target_indices"], need to be converted to halo catalogue indices by looking up the corresponding rows in self.halo_catalogue.input_halos.halo_catalogue_index);
the particle group membership information, accessible as self._server.gas._particle_dataset.group_nr_bound and similar for other particle types. The desired output is:
a list of masks ([True, False, False, ...]), one for each particle type, that pick out the particles bound to each galaxy in the list of targets for this region. This needs to be calculated more efficiently than a loop over == or similar operation for this improvement to make sense. Probably this is a clever usage of numpy.unique(..., return_inverse=True).

A good starting point would be making some dummy data for some target IDs (say an array of ~10 integers) then a big array of integers containing those 10 integers many times each (plus some other integers that are not the ones searched for) and trying to get out the corresponding masks as efficiently as possible (see if numpy.unique outperforms a loop over ==, for example).

All of this optimization only makes sense for the bound_only mask option, so will need to consider if/how to support other modes, and definitely only do this in the bound_only mode.

SWIFTSIM / swiftgalaxy

Iteration could compute masks more efficiently #15