Do we really need `Profile2pt`?

nikfilippas commented 1 year ago

Profile2pt calculates the profile covariance and implements the 2-point correlator of two halo profiles. In a way, it is akin to a class implementing a binary operation between two profiles, just like __add__, __mul__, or even numpy.correlate.

The base class implements a one-parameter __init__ accepting r_corr, which controls the correlation strength of the two profiles. However, this is not used by any of the subclasses.

class Profile2pt:
    def __init__(self, r_corr=0):
        self.r_corr = r_corr

    def fourier_2pt(self, cosmo, k, M, a, prof, *, prof2=None):
        return prof.fourier(cosmo, k, M, a) * prof2.fourier(cosmo, k, M, a) * (1 + self.r_corr)

The subclasses implement special cases (for HOD, and CIB profiles) and override fourier_2pt which internally calls HaloProfile._fourier_variance. They do not actually compute anything, and only act as an interface between the Profile2pt superclass and HaloProfile.

class Profile2ptCIB(Profile2pt):
    def fourier_2pt(self, cosmo, k, M, a, prof, *, prof2=None):
        return prof._fourier_variance(cosmo, k, M, a)

class Profile2ptHOD(Profile2pt):
    def fourier_2pt(self, cosmo, k, M, a, prof, *, prof2=None):
        return prof._fourier_variance(cosmo, k, M, a)

I think this usage is slightly awkward. It complicates the relations between halo profiles, and makes it so that halo model functions need a 2-point correlator as input. Similarly to normprof, which is now implemented under the HaloProfile class, my opinion is that we should lose Profile2pt in favour of _fourier_variance (which will become public and get a better name). It also needlessly creates an object, as Profile2pt only represents the relation between a halo profile, and another halo profile.

Subclasses are particular implementations of 2-point correlations for different profile types, and only work with a certain profile type (i.e. are not generic).

The proposed code would look like this for the default implementation:

class HaloProfile:

    def fourier(self, cosmo, k, M, a):
        ...

    def fourier_2pt(self, other, r_corr, *args):  # args are (cosmo, k, M, a)
        return self.fourier(*args) * other.fourier(*args) * (1 + r_corr)

And we could explicitly change what the calculation does, depending on the input profile types (self and other),

class HaloProfileHOD(HaloProfile):

    def fourier_2pt(self, other, cosmo, k, M, a):
        Nc, Ns = self._Nc(M, a), self._Ns(M, a)
        fc = self._fc(a)
        prof = self._usat_fourier(cosmo, k, M, a) * Ns
        return Nc * (2 * fc * prof + prof**2)

just like we do with most other baseclass-subclass relations implemented throughout CCL.

In essence, instead of creating a new subclass for every special type of correlation, we would now simply add an if-clause to the HaloProfile's subclassed fourier_2pt method. This could easily be extended to fourier_3pt, fourier_4pt etc. in the future, without the need to create new subclasses for every possible combination of halo profiles.

damonge commented 1 year ago

I haven't seen others chime in. I understand the logic of this, but worry about deprecating Profile2pts completely at this point. There are edge cases we haven't really come across yet, in which there are non-trivial covariances between profiles of the same type or even of different types. Those can grow as N^2, and I'd worry that it'll be clunky for users to have to create new bespoke profiles overloading the fourier_2pt method for each of those (rather than simply coding up a new Profile2pt).

It may be that we come up with a better way of doing this through fourier_2pt, and we can deprecate Profile2pt in the future, but I'd rather not do this blindly (i.e. without first having confronted a real-life scenario).

I'll leave this open so others can chime in, but since this can happen in v3 without delaying it, I will release v2.8 without deprecating Profile2pts

nikfilippas commented 1 year ago

You've mentioned this before, but I am not convinced this is actually the case. Particularly with the N^2 argument, like I mentioned, either we are going to have N^2 objects or N^2 if-clauses, so it makes no difference to the amount of code. But maybe I am misunderstanding something, so could you demonstrate that with an example that showcases the need for Profile2pt for these "nontrivial covariances"?

damonge commented 1 year ago

Nothing will demonstrate the need of Profile2pt against having lots of if statements. The point is what will be clunkier for users, and we haven't had enough experience with that to make this call.

nikfilippas commented 1 year ago

So, what I don't understand is 1. how will it be clunkier for users, since that would simplify the API and 2. can you elaborate on the N^2 issue you mentioned, as I think that is not actually the case.

damonge commented 1 year ago

OK, since you insist. Here's an example (only one of many different situations we may come across, which we cannot predict right now from CCL): someone has two samples of galaxies represented by an HOD, but galaxies in one sample can also be in the second sample (but not all of them). In that case, the HOD covariance is non-trivial. It's an edge case that we shouldn't have to support in CCL, at least for now.

Currently users just have to subclass their own Profile2pt for this particular correlation and use it to calculate that particular power spectrum.

With the change you propose, users would have to subclass HaloProfileHOD and overload the fourier_2pt function. Then, it's not even clear to me a priori how they would tell the current halo model calculator functions which version/if branch of the fourier_2pt method they should be using for each of the different possible correlations.

I'm sure there may be a workaround, but I see no major issue with keeping the current structure that would justify delaying the move to v3 (which is long, long overdue and delaying the implementation of important new science), since such a workaround could be implemented within v3.

nikfilippas commented 1 year ago

Your example does indeed demonstrate the need to have Profile2pt as a class handling the binary operation. However, may I just point out that this is a very niche case, and sometimes, having the most general framework for a particular implementation isn't helpful at all (neither for devs nor for users). If it was up to me, I'd get rid of Profile2pt (the changes are minor really) until it becomes evident through the use-cases, that there needs to be such an implementation (as I doubt there will soon be a need for that).

PS: There is no major issue with the current implementation - it's just slightly awkward in the same way normprof was peculiar to pass it into the functions. We have a chance to fix that now that we're breaking the API, but it's not the end of the world if we don't.

damonge commented 1 year ago

OK, I'll close this for now, but please feel free to chime in or reopen.

LSSTDESC / CCL

Do we really need `Profile2pt`? #1085