Simulation output changes significantly after reading and writing gleam catalog as skyh5 without deliberate changes

burdorfmitchell commented 3 months ago

I attempted to read the gleam.vot file stored in the reference simulation google drive using pyradiosky, and then immediately write it out as a skyh5 file. After doing so, I used both files as input to run the reference simulation 2.2_uvbeam, and noticed that the output differed significantly. I have created a heavily modified simulation which still replicates the difference at this github repo which can be run on a laptop.

The code I ran to convert the gleam catalog to skyh5 format:

from pyradiosky import SkyModel
sm = SkyModel()
sm.read_gleam_catalog("gleam.vot")
sm.write_skyh5("output.skyh5)

If I then load the output.skyh5 file in as another SkyModel and perform an equality check it passes. I similarly didn't see issues creating SkyModelData objects out of the two SkyModel objects and comparing them.

The file structure I used to produce the output (all files available via the linked github or the google drive):

📂 issue_rep
├── 📂 catalog_files
│   └── 🏗 gleam.vot
├── 📂 scripts
│   └── 📄 run_param_pyuvsim.py
└── 📂 simulations
    ├── 📄 obsparam_ref_2.2_uvbeam_gleam.yaml
    ├── 📄 obsparam_ref_2.2_uvbeam_skyh5.yaml
    └── 📂 telescope_config
        ├── 📄 5km_triangle_layout.csv
        ├── 🏗 HERA_NicCST_fullfreq.uvbeam
        └── 📄 HERA_uvbeam.yaml

I provide the lines (partially as scripts) and files I used to replicate the issue in a hopefully more accessible format on the linked github, as well as the environment.yml file I used. The reference files are edited to be significantly less compute intensive (Nfreqs: 128 --> Nfreqs: 3), and swapping the layout to use the 5km_triangle_layout.csv.

Issue documented with:

astropy                   6.1.2           py312h4fc7734_0    conda-forge
astropy-healpix           1.0.3           py312h085067d_1    conda-forge
astropy-iers-data         0.2024.8.19.0.32.16    pyhd8ed1ab_0    conda-forge
h5py                      3.11.0          nompi_py312hb7ab980_102    conda-forge
lunarsky                  0.2.5                    pypi_0    pypi
numpy                     2.1.0           py312h1103770_0    conda-forge
pyradiosky                1.0.1                    pypi_0    pypi
python                    3.12.5          h2ad013b_0_cpython    conda-forge
python-casacore           3.5.2           py312hab56ce1_6    conda-forge
pyuvdata                  3.0.0           py312h1df14c2_0    conda-forge
pyuvsim                   1.3.2.dev12+g685ee72.d20240823          pypi_0    pypi
pyyaml                    6.0.2           py312h41a817b_0    conda-forge
scipy                     1.14.1          py312h7d485d2_0    conda-forge

bhazelton commented 2 months ago

Thanks for the report, this is super weird. I just made a branch called issue_488 with a test that replicates this behavior, haven't yet figured out how to fix it but at least there's now a test.

mkolopanis commented 2 months ago

When reading the gleam VOTable with current defaults we get this warning right now:

tests/test_run.py::test_gleam_vot_skyh5
  No spectral_type specified for GLEAM, using 'flat'. In version 1.4 this default will change to 'subband' to match pyradiosky's default.

which you can track down to here.

Now the really curious thing is that if you print out the spectral type of the skymodels being compared in the test they both report as being subband type:

-------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------
Outfile path:  /tmp/pytest-of-matthew/pytest-3/test_gleam_vot_skyh50/_results.uvh5
gleam_vot.spectral_type=subband
gleam_sky_in.spectral_type=subband
Outfile path:  /tmp/pytest-of-matthew/pytest-3/test_gleam_vot_skyh50/_results.uvh5

but why is that? because in simsetup.py we're adding the spectral_type="flat" keyword to the Skymodel.from_file call, but in the test (and also in the convert.py from mitch's issue repo) when converting to a skyh5 in both from_file and read_gleam_catalog the default spectral_type is set to subband.

I actually think this difference is expected, we're performing simulations with two different skymodels and then tricking ourselves into thinking they are the same. Furthermore, if you edit line 1014 in simsetup.py to set the spectral_type to subband, as will be the default in version 1.4, both the skymodels and the output simulations match according to numpy.testing.allclose.

This would probably be more obvious, if the history string that pyuvsim writes to gives more information about the skymodel than just the file it was read from.

bhazelton commented 2 months ago

Ah ha! Thank you!

RadioAstronomySoftwareGroup / pyuvsim

Simulation output changes significantly after reading and writing gleam catalog as skyh5 without deliberate changes #488