bccp / nbodykit

Analysis kit for large-scale structure datasets, the massively parallel way
http://nbodykit.rtfd.io
GNU General Public License v3.0
111 stars 60 forks source link

Correlation function, issues and suggestions #475

Closed adematti closed 6 years ago

adematti commented 6 years ago

Hi,

Your nbodykit package is nicely done, easy to use, and your power spectrum estimator is very fast. I noticed a few bugs and have a few suggestions, as described below. I managed to correct the bugs and implement all the options below. If you are interested I can share it.

1) Concerning your implementation of the Landy Szalay estimator (algorithms/paircount_tpcf/estimators.py) (and NaturalEstimator):

I am not sure everything is clear, and I hope I am not mistaken; I would be pleased to provide more details, or share the implementation of the previous options.

Thank you for providing this very useful package, I hope it will be used in future surveys!

Cheers,

rainwoodman commented 6 years ago

Thanks for the suggestions!

The point I can immediately understand is 3. I know versions of numpy fluctuated in deciding casting range to int or float as they oscillate between different casting rules. So we shall have a work around to this like you proposed. We shall also add a check that no empty array is passed to corrfunc if it fails. If you already have the changes done, then a Pull Request would be very helpful.

Here is a tutorial on Pull requests:

https://zonca.github.io/2017/06/quick-github-pull-requests.html

Your point 4 -- I thought the only feasible way of doing angular correlation is to project to a unit sphere for a 3d embedding and corr func is doing this already. I must have missed something here.

For point 1 and 2, perhaps @nickhand can weight in a bit more. I may need to see your weighting changes in point 1 in the context of code to better understand it. Would you mind filing your change set as another Pull Request?

For point 2, I in principle do agree these can be useful.

adematti commented 6 years ago

Thanks for your answer! I should be able to create pull requests within the next few days. Concerning the point 4: you are right, at some point we need to project on the unit sphere. The point is that angular correlation calculation requires computing the time-consuming theta = arccos (dot product of lines of sight) which is then binned (at least it was how David Alonso proceeded in CUTE). Another possibility is to bin r = euclidean distance between two objects (located on the unit circle), with r_edges = 2.sin(theta_edges/2.), which is twice faster. The drawback, though, is that a priori thetaavg cannot be inferred from ravg, though in practice theta = 2.arcsin(ravg/2.) is very close to thetaavg. Writing these lines I figure out that there are compile-time options in corrfunc to enable/disable calculation of thetaavg and a fast ersatz for arccos (which is of course better than what I suggest). The calculation of thetaavg is disabled by default, though we get a thetaavg. I am missing something here.

Cheers,

adematti commented 6 years ago

To come back on point 3, when performing more extensive tests, I figured out that Corrfunc DDsmu_mocks crashes when provided arrays are of length 1. I would say that the issue of corrfunc crashing with length 0 or 1 arrays should be properly dealt within corrfunc itself. Thus, I will not provide any workaround for this (that would only work with length 0 arrays); except the small change range -> numpy.arange in algorithms/pair_counters/corrfunc/base.py, as discussed before. I am opening a pull request on the Landy Szalay estimator, though.

Cheers

nickhand commented 6 years ago

Thanks for the detailed suggestions, @adematti! The CF code is not as well-used or well-tested as the power spectrum functionality, so we appreciate the feedback!

I'm reviewing your PR for point 1 right now and will add some comments there. I agree with your point 2 that those will be very nice additions to the package. For 3, I was aware of that Corrfunc had some issues in this area and had tried to work around it in most cases, but thanks for reporting this additional issue.

And for 4, you should be able to pass any valid Corrfunc keywords to the init function of the specific algorithm you are using. By default, we turn on the theta avg calculation, but you should be able to turn that off by passing output_thetaavg=False when initializing the algorithm. It occurs to me now that this has not been well tested or advertised in the documentation, so we should fix that

adematti commented 6 years ago

Sorry I could not update other things this week. Thanks for your support and for your high-quality package, I will advertise it to the eBOSS members I work with. By the way, is nbodykit deemed to be a reference code for DESI? I guess we can close this issue now.

Cheers

rainwoodman commented 6 years ago

Thanks! I'll close this issue then. I am not in a position to answer whether nbodykit is deemed to be a reference code for DESI. It appears to have been useful there so far.