SWIFTSIM / swiftsimio

Python library for reading SWIFT data. Uses unyt and h5py.
GNU Lesser General Public License v3.0
16 stars 13 forks source link

Opening one of the sub-file of a distributed snaphot leads to unexpected error when using the cell masking #169

Closed MatthieuSchaller closed 10 months ago

MatthieuSchaller commented 1 year ago

Bug reported on Slack by @Joeybraspenning.

Code snippet running on a flamingo box and reading the .0 file rather than the virtual file:

    load_region = [[xc - 2*r500c, xc+2*r500c], [yc-2*r500c, yc+2*r500c], [zc-2*r500c, zc+2*r500c]]
    print(f'{load_region=}')
    # Constrain the mask
    mask.constrain_spatial(load_region)
    # Now load the snapshot with this mask
    data = sw.load(filename, mask=mask)
    print(f'{np.min(data.gas.coordinates, axis = 0)=}')
    print(f'{np.max(data.gas.coordinates, axis = 0)=}')

leads to

load_region=[[unyt_quantity(102.72425057, 'Mpc'), unyt_quantity(104.15979745, 'Mpc')], [unyt_quantity(89.56892057, 'Mpc'), unyt_quantity(91.00446745, 'Mpc')], [unyt_quantity(3.98413057, 'Mpc'), unyt_quantity(5.41967745, 'Mpc')]]
np.min(data.gas.coordinates, axis = 0)=cosmo_array([125.00008293, 687.50001293, 343.75001293], 'Mpc')
np.max(data.gas.coordinates, axis = 0)=cosmo_array([156.24988293, 718.74998293, 406.24991293], 'Mpc')

The code should either use the meta-data to open the correct sub-file or throw an error about trying to access a region not present in the file.

Joeybraspenning commented 1 year ago

This might be the better code snippet to refer to as it includes the read of the file

import swiftsimio as sw
import numpy as np
from unyt import Mpc

filename = "/cosma8/data/dp004/flamingo/Runs/L1000N1800/HYDRO_FIDUCIAL/snapshots/flamingo_0077/flamingo_0077.0.hdf5"
mask = sw.mask(filename)

load_region = [[10*Mpc, 15*Mpc], [100*Mpc, 110*Mpc], [200*Mpc, 210*Mpc]]
print(load_region)
# Constrain the mask
mask.constrain_spatial(load_region)
# Now load the snapshot with this mask
data = sw.load(filename, mask=mask)

print(np.min(data.gas.coordinates, axis = 0))
print(np.max(data.gas.coordinates, axis = 0))
JBorrow commented 12 months ago

Seems like there is no information in the snapshot that it is partial. Could we add this to SWIFT for future runs? I guess we can also check if NumPart_ThisFile != NumPart_Total, and also say if Virtual then we're good? Tricky.

Update: There is NumFilesPerSnapshot and ThisFile saying which index it is. The virtual files seem to have NumFilesPerSnapshot = 1 and ThisFile = 1 as well as Virtual = 1.

MatthieuSchaller commented 12 months ago

/Header/NumFilesPerSnapshot tells you whether you need to read more than one thing.

The virtual file says num_file=1 since you only need to read that one. But it sets /Header/Virtual to 1.

JBorrow commented 12 months ago

So I can check if NumFilesPerSnapshot is > 1 and say "no partial loading for you"?

MatthieuSchaller commented 12 months ago

Yes.

MatthieuSchaller commented 10 months ago

Fixed by #170.