SWIFTSIM / velociraptor-python

Python tools for using velociraptor catalogues, with SWIFT integration
GNU Lesser General Public License v3.0
0 stars 8 forks source link

extract_halo requires an integer id #70

Closed kyleaoman closed 1 month ago

kyleaoman commented 3 years ago

https://github.com/SWIFTSIM/velociraptor-python/blob/b168a6b497758ab6ede47be8aafd192e8b95ef68/velociraptor/particles/particles.py#L100

The requirement to provide an int for the halo_id here seems too strict. For example, it barfs on a np.int64.

JBorrow commented 3 years ago

In what way does it barf? I think that int in this context means just an integer as it's a type hint right? I suppose we could use typing.Integer

kyleaoman commented 3 years ago

Ah, now not sure where I got the np.int64 from, but the following fails because it doesn't like a unyt_quantity:

import numpy as np
from os import path
from velociraptor import load as load_catalogue
from velociraptor.particles import load_groups
import unyt as u

snapnum = 23
base_dir = '/cosma/home/durham/dc-oman1/ColibreTestData/'\
    '106e3_104b2_norm_0p3_new_cooling_L006N188/'
velociraptor_filebase = path.join(base_dir, 'halo_{:04d}'.format(snapnum))
snapshot_filename = path.join(base_dir, 'colibre_{:04d}.hdf5'.format(snapnum))

catalogue = load_catalogue(
    path.join(base_dir, f'{velociraptor_filebase}.properties')
)
target_mask = np.logical_and.reduce((
    catalogue.ids.hosthaloid == -1,
    catalogue.velocities.vmax > 50 * u.km / u.s,
    catalogue.velocities.vmax < 100 * u.km / u.s
))
target_halo_ids = catalogue.ids.id[target_mask]
target_halo_id = target_halo_ids[0]
groups = load_groups(
    path.join(base_dir, f'{velociraptor_filebase}.catalog_groups'),
    catalogue=catalogue
)
print(type(target_halo_id))
particles, unbound_particles = groups.extract_halo(halo_id=target_halo_id)

with:

<class 'unyt.array.unyt_quantity'>
Traceback (most recent call last):
  File "swiftgalaxy.py", line 251, in <module>
    particles, unbound_particles = groups.extract_halo(halo_id=target_halo_id)
  File "/cosma7/ICC-data/dc-oman1/code/velociraptor-python/velociraptor/particles/particles.py", line 135, in extract_halo
    number_of_particles = self.offset[halo_id + 1] - self.offset[halo_id]
IndexError: arrays used as indices must be of integer (or boolean) type

Getting the id programmatically from the group catalogue seems like a reasonable use-case ;)

JBorrow commented 3 years ago

Ah - sorry - this is very misleading because the halo_id here is actually just the position of the halo in the catalogue file not its unique ID.

kyleaoman commented 3 years ago

It probably should be the unique id! In the examples I've seen the two are in principle interchangeable, but I guess there's no guarantee that this will hold, right?

JBorrow commented 3 years ago

Indeed, if you use a VR catalogue that was generated with the MPI version of the code this may not hold at all.

It probably should be the unique ID but that then requires storing a hashtable somewhere or doing an nlogn search every time you want to load a halo. So if you want to do something like loop through all haloes in the file then this may be wasted effort...

kyleaoman commented 3 years ago

Should it maybe accept either an index or an id then? Alternately, I think as a minimal change the argument could just be renamed for clarity.

JBorrow commented 3 years ago

Happy for it to be renamed. Might also be nice to have a function or method somewhere that get_halo_index_from_id or something (it could even cache the required hashtable).

kyleaoman commented 3 years ago

Changed the name of the argument, think I covered all the locations where it's relevant, and updated the docs:

71

Would still be nice to have a get_halo_index_from_id and/or accept a halo_id as an alternative argument, but haven't implemented that yet.