bccp / nbodykit

Analysis kit for large-scale structure datasets, the massively parallel way
http://nbodykit.rtfd.io
GNU General Public License v3.0
111 stars 60 forks source link

populating an HOD with mdef='200m' fails #634

Open jmsull opened 4 years ago

jmsull commented 4 years ago

When I try to populate an HODCatalog with a Zheng07 model using a HaloCatalog with mdef='200m' I get a trace that ends in the following error: "halotools.custom_exceptions.HalotoolsError: Your model requires that the halo_rvir key appear in the halo catalog, but this column is not available in the catalog you attempted to populate."

which comes from here in halotools.

Given the code here it seems like if I pass mdef=200m (or 200c etc.), HaloCatalog should use this when constructing the halotools mock.

This should work for populating halo catalogs with "any integer XXXm" definition according to the documentation of HaloCatalog.

Attached is a screenshot of a minimal reproducing example based on the HOD example notebook (which I modified since it does not work at the moment, since HODCatalog does not exist https://github.com/bccp/nbodykit/issues/625).

I have no problems with this when mdef='vir'.

Here is the trace/output:


srun -n 1 python -u nbk_example.py [ 000009.66 ] 0: 08-14 19:38 LogNormalCatalog INFO Growth Rate is 0.770123 [ 000009.66 ] 0: 08-14 19:38 LogNormalCatalog INFO Generating whitenoise [ 000022.93 ] 0: 08-14 19:39 LogNormalCatalog INFO Write noise generated [ 000044.13 ] 0: 08-14 19:39 LogNormalCatalog INFO Displacement computed in fourier space [ 000046.21 ] 0: 08-14 19:39 LogNormalCatalog INFO Overdensity computed in configuration space: std = 1.1929580338252221 [ 000052.44 ] 0: 08-14 19:39 LogNormalCatalog INFO Displacement computed in configuration space: std = [4.238011674174512, 4.235169508603116, 4.313347332404491] [ 000052.49 ] 0: 08-14 19:39 LogNormalCatalog INFO gaussian field is generated [ 000053.05 ] 0: 08-14 19:39 LogNormalCatalog INFO Lognormal transformation done [ 000060.73 ] 0: 08-14 19:39 LogNormalCatalog INFO Poisson sampling done, total number of objects is 7886734 [ 000083.09 ] 0: 08-14 19:40 LogNormalCatalog INFO catalog produced. Assigning in cell shift. [ 000122.66 ] 0: 08-14 19:40 LogNormalCatalog INFO sorting done [ 000125.97 ] 0: 08-14 19:40 LogNormalCatalog INFO catalog shifted. [ 000126.10 ] 0: 08-14 19:40 LogNormalCatalog INFO poisson sampling is generated [ 000131.11 ] 0: 08-14 19:40 FOF INFO Number of particles max/min = 7886734 / 7886734 before spatial decomposition [ 000147.16 ] 0: 08-14 19:41 FOF INFO Number of particles max/min = 7886734 / 7886734 after spatial decomposition Traceback (most recent call last): File "/dev/shm/local/lib/python3.7/site-packages/halotools/empirical_models/factories/hod_mock_factory.py", line 172, in preprocess_halo_catalog self._orig_halo_table[key] = halo_table[key][:] File "/dev/shm/local/lib/python3.7/site-packages/astropy/table/table.py", line 1602, in getitem return self.columns[item] File "/dev/shm/local/lib/python3.7/site-packages/astropy/table/table.py", line 239, in getitem return OrderedDict.getitem(self, item) KeyError: 'halo_rvir'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "nbk_example.py", line 20, in galaxies = hod.populate(Zheng07Model,paramDict, seed=1) File "/dev/shm/local/lib/python3.7/site-packages/nbodykit/source/catalog/halos.py", line 269, in populate return _populate_mock(self, model, seed=seed, halocat=halocat, params) File "/dev/shm/local/lib/python3.7/site-packages/nbodykit/source/catalog/halos.py", line 356, in _populate_mock model.populate_mock(halocat=halocat, kws) File "/dev/shm/local/lib/python3.7/site-packages/halotools/empirical_models/factories/hod_model_factory.py", line 1209, in populate_mock ModelFactory.populate_mock(self, halocat, kwargs) File "/dev/shm/local/lib/python3.7/site-packages/halotools/empirical_models/factories/model_factory_template.py", line 231, in populate_mock self.mock = self.mock_factory(**mock_factory_init_args) File "/dev/shm/local/lib/python3.7/site-packages/halotools/empirical_models/factories/hod_mock_factory.py", line 93, in init self.preprocess_halo_catalog(halocat) File "/dev/shm/local/lib/python3.7/site-packages/halotools/empirical_models/factories/hod_mock_factory.py", line 174, in preprocess_halo_catalog raise HalotoolsError(unavailable_haloprop_msg % key) halotools.custom_exceptions.HalotoolsError: Your model requires that the halo_rvir key appear in the halo catalog, but this column is not available in the catalog you attempted to populate.


Version nbodykit version 0.3.14 halotools version 0.7

m200_vir_example
rainwoodman commented 4 years ago

Sorry for getting back to this so late.

I mostly only used the vir myself, so I am not super familar with the 200 series of mass definitions. But I did tried this with a test case and was able to reproduce the failure ( #637).

The requirement appears to be hardcoded as the default in

https://github.com/astropy/halotools/blob/master/halotools/empirical_models/model_defaults.py#L31

And then unconditionally added as a requirement to all mock model instances here:

https://github.com/astropy/halotools/blob/master/halotools/empirical_models/factories/mock_factory_template.py#L82

These logics seem to have been around for a very long time (5 years since last touched).

A solution on the nbodykit side would be passing along mvir and rvir during the conversion. But I am concerned that if we are to pass apparently unused data. Any comments from @aphearin?

aphearin commented 4 years ago

Sorry this has been causing a headache. Originally I thought it was a good idea to enforce self-consistency between the halo mass definition and halo boundary, but in hindsight it's obvious this is something that should have been left up to the user. Actually I think this is true for many of the requirements enforced about halotools halo catalogs, so I've been putting off an overhaul to the sim_manager sub-package for a future release that strips away a lot of the unnecessary structure. But I think this particular fix shouldn't be too bad, so thanks for pinging me about this - https://github.com/astropy/halotools/issues/993.