Closed damonge closed 5 years ago
I had a look at your cleaned up cosmos_weight.py file, everything looks good, can’t see any problems. Also, I saw you used a leafsize of 2*n_neighbours to when finding the neighbouring galaxies(see below).. is this some sort of standard practice, or did you just choose 2? (I’m just curious)
tree_NN_lookup = spatial.cKDTree(photoz_sample, leafsize=2*self.config['n_neighbors’])
Thanks a lot @zahragomes ! Regarding the leafsize: good question. I just saw you were using 20 neighbors and a leafsize of 40, so I assumed that was a rule of thumb. Should I just leave it hardcoded to 40?
I don't think it's going to matter here. the leafsize just gives the number of samples below which the algorithm will start using the brute force method of finding neighbours (find the distance between all pairs), instead of the much more efficient KDtree method which reduces the number of distances to be calculated. At small numbers of samples(galaxies), brute force can be more efficient than KDtree, so it makes sense to switch to brute force, but in our case our photoz_sample will always be much to big for this to matter. So as long as we don't start using very large n_neighbours, 2*n_neighbours should be fine.
OK, great. I'll revert it to 40 for safety (just in case at some point we do explore large n_neighbours, although I doubt it).
@humnaawan , @adam-broussard : let me know your thoughts when you've had a chance to look at the code!
@damonge I think your code looks good with one potential stipulation. I used np.trapz()
when I was summing up the PDF and I noticed that you are just using sum()
. Does that not create an issue where the answer could be dependent on bin size?
@adam-broussard thanks a lot!
I think that might be the case depending on the type of interpolation one uses. However, what I did is plot my N(z)s vs. yours and saw that they were essentially equivalent (e.g. O(1E-3) differences). This may be marginally less accurate but it's a bit faster. Let me know if this concerns you.
@damonge Okay, I remember mine took a decent chunk of time to run, so yours is probably the better method. Looks good!
@damonge I think the new code looks good. my apologies for the delay in getting back.
Awesome. Thanks a log @humnaawan and @adam-broussard !
Rewrote our initial pipeline as a ceci pipeline, and cleaned it up significantly. Also made it possible to download all the necessary data automatically. I have checked that, running the new pipeline with the same parameters, I obtain exactly the same outputs at every stage down to floating-point accuracy, so I am confident that the new code is correct (or as correct as the initial one). Before merging into master, however, it would be good for the people who coded up some parts of the pipeline to check the new code briefly:
get_nz_stack
in hsc_lss/cat_mapper.py, where I've implemented the pdf-stacking algorithm you wrote initially. I've modified it a bit, but I think it should give similar results (visually they are indistinguishable).