Closed damonge closed 6 years ago
Alright @humnaawan , this is how I'd compute the S/N for a given choice of Nbin.
(2*ell+1) Tr(S_ell . C_ell^-1 . S_ell . C_ell^-1)
(here S_ell is the element of S for multipole ell, the same for C, ^-1
means matrix inversion, Tr
means "trace" and .
denotes matrix multiplication - dot product). So the idea is to compute the quantity above for each ell and then sum over ells to get the S/N.
So a plot of this quantity as a function of Nbin should tell us how many bins we want to use.
Hope this is not too convoluted. Let me know if you want me to clarify anything.
@damonge please see the notebook for an attempt to calculate the S/N for different Nbin. I was expecting the S/N to plateau rather quickly as I increase Nbin but it doesn't; the gain decreases with more bins. Here's the last output from the notebook:
I will try to re-check if I am implementing the methodology incorrectly; I have not found any bugs so far. In the meantime, a few notes/questions for you:
WIDE_AEGIS_MaskedFraction.fits
and using the info from flatmaps, but the area I get is far less than 108/7 deg2. Specifically I get fskb.dxfskb.dy = 0.001, which I assume is the area of each pixel in deg2; then len(mskfrac[mskfrac>0.5])0.001 should give me the area in deg2 but I get 1.2678. This would affect the number density and hence the shot noise calculation since in the notebook, I am setting the field area to be 108/7 deg2 (based on the idea that the total WIDE area is 108 deg2 and we have 7 WIDE fields).Looks pretty reasonable so far to me. A couple of questions:
@egawiser there isnt any difficulty (as far as I am thinking rn) in combining all the 7 fields. I just wanted to confirm before I set up the code for it.
Also, the way I've implemented David's outline, f_sky only comes in when estimating the number density in step 2.
This is great @humnaawan !
np.sum(maskedfraction)*np.radians(fsk.dx)*np.radians(fsk.dy)
. The shot noise contribution to the power spectrum will just be this number divided by the total number of clean objects in that redshift bin.So, I would say, @humnaawan is extremely close to solving this issue once the following is addressed:
(while writing this my mousepad betrayed me and I accidentally pressed "close and comment", sorry about that!)
A few other minor comments, now that I'm looking at @humnaawan 's notebook.
The notebook looks good otherwise. You're becoming a CCL expert!
@damonge @egawiser here's a short rundown of the changes that are implemented in the code since my last post, following your comments:
I have re-printed some representative results in this notebook, while all the actual output plots (and the sbatch outputs) are in /global/cscratch1/sd/awan/lsst_output/hsc_output/
. Here are some highlights:
We see that the results are largely the same across the two fields, with SN starting to plateau for ~Nbin>7. The two algorithms (ephor_ab and franken_z) are giving similar results and we have similar trends when using z_mode as the point estimator (please see Output[12] in the notebook for comparison plots).
So we have 0.15, 0.50, 0.76, 1.0, 1.5 as the bin edges for Nbin=4,. We get the same bin edges from VVDS; see Output[13] in the notebook.
So we have 0.15, 0.47, 0.65, 0.86, 1.10, 1.5 as the bin edges for Nbin=5. We get the same bin edges from VVDS; see Output[14] in the notebook.
As @damonge pointed out during our discussions today, doing the analysis for more than 5 bins would be rather computationally prohibitive, so the idea is to use 4-5 bins, with bin edges from the larger fields when using ephor_ab z_best data as point estimators. I'll create the files to use the bin edges for 4 and 5 bins in cat_sampler.py.
Please let me know if there are any concerns.
Awesome @humnaawan! Definitely no need to optimize anything further. We only need this to kind of guide our choice of bins, so this is perfect.
More than 5 bins is not necessarily prohibitive, I just doubt we'll get much better results, since we'll need to include nuisance parameters for any new bin (which is something that isn't quantified here). So I'd say, let's run with 4 or 5, and we can try something else later if we get ambitious.
Closing!
Given the issue of nuisance parameters and the modest S/N improvement above for N>4, I'd suggest N=4 as a baseline, with N=6 as a "stretch goal" to be tried if we'd like to modestly improve upon the N=4 results. I also note that N=4 would allow bin edges of 0.15, 0.50, 0.75, 1.0, 1.5 that look nicely rounded and would result in almost perfect equipartition of the number of galaxies per bin.
We need to decide what binning to use. Initial proposal is to explore the dependence of overall S/N on the number of bins Nbin with equal number of galaxies in each of them, and cut at a sensible value of Nbin such that no significant S/N is gained and such that it doesn't entail a ridiculous computational effort.
Tagging @humnaawan as the person that has taken the lead of this