astropy / halotools

Python package for studying large scale structure, cosmology, and galaxy evolution using N-body simulations and halo models
http://halotools.rtfd.org
BSD 3-Clause "New" or "Revised" License
104 stars 65 forks source link

Leauthaud11 HOD #1061

Closed zaidikumail closed 1 year ago

zaidikumail commented 1 year ago

Within the Leauthaud11 HODs internal calculations, can I make halotools use h=1 units all the way through? For instance, the Leauthaud11 parameters like smhm_m0_0, smhm_m1_0, and bsat are set to default values from the literature in the units of h=0.7 or 0.72. So, while the user inputs like the stellar mass threshold to initialize the Leautahaud11 HOD model are given in the h=1 units, halotools internally converts the stellar mass threshold to the h=0.7 or 0.72 units, matching the literature values of the other Leauthaud11 HOD parameters. Then it does the calculations and returns the output/s in h=1 units e.g within the mean_log_halo_mass() and _update_satellite_params() functions within ``halotools.empirical_models.occupation_models.leauthaud11_components```.

Moreover, within the aforementioned _update_satellite_params() function, in the following line, knee_threshold = (10.0**log_halo_mass_threshold) * self.littleh, should the self.littleh be divided and not multiplied to bring the knee_threshold to the same footing (h=0.72) as kness_mass coming from the literature?

aphearin commented 1 year ago

Hmm, thanks for pointing this out @zaidikumail. It's definitely possible that the parameters of the halotools implementation of the Leauthaud11 model have a different meaning from the published values in the Leauthaud+11 paper due to a little h. It's helpful that you've pinpointed a couple of places where this might be rectified - do let me know in case you see other spots that would simplify a direct comparison to the values in the Leauthaud+11 publication.

zaidikumail commented 1 year ago

Thanks for your reply! I can definitely do that. One more thing I realized which you might be interested in is that the mean satellite occupation <Nsat> is not equal to '0' when the mean central occupation is equal to '0'. This should not be the case if the following line in the mean_occupation() function within the Leauthaud11Sats() class is correctly implemented: if self.modulate_with_cenocc is True: (tab)mean_nsat *= self.central_occupation_model.mean_occupation(**kwargs)

Regardless, I am concerned about the Leauthaud11 parameter units within the halotools because I am trying to fit my clustering data using the Leauthaud11 model in an MCMC setting and would like to update the Leauthaud11 parameters at each step. So, I want to make sure that these parameters are exactly in the units I think they are in. In fact, I am adapting the Leauthaud11 model from halotools to be used in another code - halomod to do the MCMC fitting (I am happy to share that code with you). I am doing this because updating HOD parameters at each step, then generating the mock catalog and then calculating the angular correlation function within halotools seems very time-consuming. It doesn't work well for something like 40 walkers and 6000 iterations. Please let me know if you have any tricks up your sleeve on how to speed that up as well in a cluster environment beyond using multiple cores and MPIPool() etc.

aphearin commented 1 year ago

generating the mock catalog and then calculating the angular correlation function within halotools

Yes, this would be crazy slow with Halotools. I have been debating simply deleting the angular correlation functionality within halotools, because the code was really only written/optimized for xyz periodic boundary conditions. For angular correlation functions, you should either use Corrfunc or TreeCorr.

zaidikumail commented 1 year ago

Yes, I used Corrfunc (pycorr) for measuring the correlation functions in my data. Now, I am trying to fit that data with an HOD model using halomod (by adapting the Leauthaud11 code from halotools) which is fast enough.

aphearin commented 1 year ago

Right but what I am saying is that Corrfunc also will compute the angular correlation function in your mock data that you use to make predictions. And it will be much faster than using halotools (or any other code that I am aware of).

zaidikumail commented 1 year ago

Oh, I see. You are saying that I use Corrfunc to measure the angular correlation function in my mock data which in turn is produced by employing a particular combination of Leauthaud11 HOD parameters within halotools.

aphearin commented 1 year ago

Yes, exactly. I think you will find that to be a dramatic speedup in your likelihood analysis.

zaidikumail commented 1 year ago

Great, thank you! I will try that.

By any chance, can you give an update on exactly what h units are being assumed by Leauthaud11 HOD parameters within halotools anytime soon? This is so that I can do my MCMC analyses with clarity. You are also most welcome to join in our study, time-permitting. I can send a you a private email detailing our project if you’d be interested.

aphearin commented 1 year ago

Hi @zaidikumail thanks for your generous offer, but I'm happy to help with this fix regardless and so there's no need to include me on your project - this is something I'd like to fix. He's one way that I think should be reasonably simple to implement: begin by eliminating all references to littleh within Leauthaud11, with appearances of littleh in the current implementation simply removed; then on the very first line of any Leauthaud11 function that accepts halo mass, convert halo mass from h=1 to h=0.72. That seems like the most straightforward way to quickly fix this so that the meaning of the numerical values of the halotools model parameters are the same as the corresponding ones in the publication - what do you think?

I'll do my best to get this done next week since I'll be away for most of June, but I can't say for sure. If you need this on a shorter timescale, you could try implementing the above solution or some alternative, and submit a PR.

zaidikumail commented 1 year ago

Hi @aphearin,

Thank you, then! I think your approach sounds good. Making the meaning of all the Leauthaud11 parameters the same as in the publication would definitely help with the comparison. However, I was thinking that it might be better to keep things general all the way through. i.e. keep everything in units of h=1, even converting the default parameter values from the literature into h=1 units before setting them in halotools.

I think this would be better because in halotools, the default parameter values that have to do with the central occupation come from Behroozi+ 2010 with h=0.7, however, the satellite occupation default parameter values come from Leauthaud+ 2012 with h=0.72. It will be simpler to keep everything in terms of h=1. This will help users to easily vary the Leauthaud11 HOD parameters within MCMC or whatever without worrying about keeping the central occupation parameters in terms of h=0.7 and the satellite occupation parameters in terms of h=0.72. What do you think?

p.s. I think if you're able to manage it by the first week of June, that'd be great. Otherwise, no worries, I will figure something out for now.

aphearin commented 1 year ago

I think this would be better because in halotools, the default parameter values that have to do with the central occupation come from Behroozi+ 2010 with h=0.7, however, the satellite occupation default parameter values come from Leauthaud+ 2012 with h=0.72.

Right, yes, @zaidikumail I noticed that too when I looked recently. I think I will just entirely decouple Behroozi10 class from the Leauthaud11 class. That seems like a simpler implementation to maintain, since then there is no impact at all on the current implementation of Behroozi10.

This will help users to easily vary the Leauthaud11 HOD parameters within MCMC or whatever without worrying about keeping the central occupation parameters in terms of h=0.7 and the satellite occupation parameters in terms of h=0.72. What do you think?

Is this something that users would need to worry about in practice? The user can always run their MCMC with parameters in whatever units, and then convert at the end. I think the main problem with the current implementation is actually an internal inconsistency that prohibits direct comparison to Leauthaud+11. But once the code is internally consistent, I think switching back and forth is something the user can always do themselves, no? The implementation I suggested seemed simple and quick to do, and I thought you were making a request for a short turnaround time on the fix.

zaidikumail commented 1 year ago

Yes, I agree. As long as the halotools Leauthaud11 parameters are self-consistent, then converting them to any other units should not be a problem.

zaidikumail commented 1 year ago

Thank you so much @aphearin !!