Improve CF Estimators vs Pair Counters

bccp / nbodykit

Analysis kit for large-scale structure datasets, the massively parallel way

http://nbodykit.rtfd.io

GNU General Public License v3.0

111 stars 60 forks source link

Improve CF Estimators vs Pair Counters #405

Closed nickhand closed 6 years ago

rainwoodman commented 7 years ago

At least the documentation shall mention how to construct a Landy-Slazay estimator. What about the other forms of estimators?

nickhand commented 7 years ago

I am thinking of a simple wrapper algorithm that takes in an FKPCatalog and computes all of the necessary correlations for the user-specified estimator

rainwoodman commented 7 years ago

What about correlation in a PeriodicBox?

nickhand commented 7 years ago

we use the analytic randoms calculation in the periodic box to compute the CF in the result already

nickhand commented 7 years ago

We need to be clearer about pair counting vs correlation function estimators, which estimators are available, and when we can use analytic vs non-analytic randoms.

I'd like to hide as much of the survey data vs simulation data as possible, but need to think about how to best do that. It would be nice to provide an equal treatment of angular correlation functions, projected correlation functions, and 3D correlations functions for both simulations and data.

This will likely overhaul some of the existing stuff we have, so better to do it sooner rather than later. Any thoughts, @rainwoodman?

rainwoodman commented 7 years ago

The only input I can think of quickly is that there are two types of scatter point data -- samples of a field (value and weight) and realizations of a density (weight). They can be cross correlated as well. The optimal estimator differs depends on who is correlating with whom. That was why in kdcount I opted to give only sum1 and sum2, and leaving the estimators out.

Maybe a starting point is to mock up the function signatures a few 'algorithms' that directly does the optimal estimators, then take a look if there is any common ground between them?

nickhand commented 7 years ago

I was thinking of only supporting the latter to start. Is there a common use case in the literature for samples of a field? What are the estimator differences?

Even within just pair counters, we should add better support for correlation function as a function of r and r,mu, projected correlation functions, and angular functions. For simulation cubes we can use analytic randoms, and use one of the estimators for survey data, starting with Landy Slazay but also the less common ones, ie DD/RR - 1

rainwoodman commented 7 years ago

Lyman-alpha x quasar was a common use; but maybe nowadays people do it with a faster algorithm (e.g. by los decomposition and looking at the hybrid of power and correlation function?)

Marked correlation function is similar with weight and values. The angular people do it a lot.

There are two ways of constructing this

A "correlation function" that takes the type of estimator, data and random.
Or you have a correlation.LandySlazayEstimator that takes data, and random, and calls the internal paircouting routines.

nickhand commented 6 years ago

CF estimator algorithms + more pair counting functionality has been added in #439.