bccp / nbodykit

Analysis kit for large-scale structure datasets, the massively parallel way
http://nbodykit.rtfd.io
GNU General Public License v3.0
111 stars 60 forks source link

Issues in Base3PCF #515

Closed adematti closed 6 years ago

adematti commented 6 years ago

Hi,

Willing to implement Slepian et al. 2017 (arXiv: 1709.10150v1)'s anisotropic 3pt correlation function, I started from your code. I noticed two things:

Thank you for your code,

Cheers

rainwoodman commented 6 years ago

I can answer the second question now. Yes you are. The bunch size is small, but the process function will be called multiple times -- when we seen enough number of pairs to fill the buffer for the bunch, the process function is called once. This reduces the Python function call overhead and allows us to write vectorized numpy operations for many pairs.

The first question I will have to look into.

rainwoodman commented 6 years ago

I agree with you the code looks buggy. l is not the correct index. Also the ordering of the return value of __call__ of YlmCache is undefined, so it may return results in inconsistent sequences on different ranks and cause the result to be wrong when run in parallel. May also cause problems decoding the results to l values.

rainwoodman commented 6 years ago

Do you want to file a PR to fix these? Adding @nickhand to this too.

adematti commented 6 years ago

Hi,

Sorry I was away last week (and will not be very available this week either).

2) I get that the process function can be called several times for each primary (till all secondaries are treated), that is a clever way to proceed. However, I argue that the process function cannot be vectorized along the secondaries here, as you are making the product of alms l. 99 of threeptcf.py (just like (x1+x2+...)(y1+y2+...) != x1y1+x2*y2+... in general). If I set e.g. tree_sec.enum(tree_prim, rmax, process=callback, iprim=iprim, bunch=100) (instead of bunch=100000) l. 111, then the test test_threeptcf.py::test_sim_threeptcf fails. To deal with this one should probably sum all the alms obtained for different chunks of secondaries and make the alms product at the very end.

Concerning the PR, thanks for proposing! Though, I believe it would be faster for you to change the code yourself.

Cheers

rainwoodman commented 6 years ago

I see what you mean line 99 doesn't look right indeed. I will file a PR with these fixes.

rainwoodman commented 6 years ago

@nickhand could you comment on how the current threeptcf_sim_result file is obtained? I only see you updated it in d5dcbcccb955. After implementing adematti's suggested fix I am getting a very different result.

nickhand commented 6 years ago

I agree that both of these issues are bugs, but shouldn't the result of the test case be unaffected? I had thought that the fix should produce the same result when bunchsize = 100 as the current code produces with bunchsize=100000.

The test result is produced from Daniel Eisenstein's C++ version of the algorithm, which should be correct. I am sending that code to @rainwoodman right now.

rainwoodman commented 6 years ago

Thanks! I added a few notes about the source of the data files in the test case. turns out the disagreement was because I didn't code it up correctly.

rainwoodman commented 6 years ago

Fixed in #516 516.