SWIFTSIM / swiftgalaxy

Load in particles of a simulated galaxy, rotate coordinates, easy spherical/cylindrical coordinates, access integrated properties, and more.
GNU General Public License v3.0
2 stars 1 forks source link

Iteration could compute masks more efficiently #15

Open kyleaoman opened 17 hours ago

kyleaoman commented 17 hours ago

In the initial SWIFTGalaxies iterator class masks are calculated for each galaxy here:

https://github.com/SWIFTSIM/swiftgalaxy/blob/2e9e4779221840e2c2f87dcdcfbcba4dee1876a7/swiftgalaxy/iterator.py#L310-L312

within the loop over galaxies. This means that for each galaxy we evaluate:

https://github.com/SWIFTSIM/swiftgalaxy/blob/2e9e4779221840e2c2f87dcdcfbcba4dee1876a7/swiftgalaxy/halo_catalogues.py#L274-L289

The == operation is fairly expensive. Perhaps the masks can be pre-computed for all target galaxies in a region just after the data preloading loop:

https://github.com/SWIFTSIM/swiftgalaxy/blob/2e9e4779221840e2c2f87dcdcfbcba4dee1876a7/swiftgalaxy/iterator.py#L304

Here, instead of looping over the galaxies with == and finding the matches in group_nr_bound, a more efficient solution needs to be found. The inputs are:

A good starting point would be making some dummy data for some target IDs (say an array of ~10 integers) then a big array of integers containing those 10 integers many times each (plus some other integers that are not the ones searched for) and trying to get out the corresponding masks as efficiently as possible (see if numpy.unique outperforms a loop over ==, for example).

All of this optimization only makes sense for the bound_only mask option, so will need to consider if/how to support other modes, and definitely only do this in the bound_only mode.