erykoff / redmapper

The redMaPPer Cluster Finder
Apache License 2.0
22 stars 8 forks source link

Optimising configuration for high-z cluster runs #61

Open jacobic opened 3 years ago

jacobic commented 3 years ago

Hi Eli,

I was just wondering if it would be possible to provide some advice about adjusting the config for optimising completeness at high-z > 0.8. I am currently using DR8 of the Legacy Imaging Surveys (grz) with WISE (w1) and running redmapper in scanning mode (maximising lnlike as a function of redshift) for X-ray selected clusters.

I have experienced some excellent results with when performing optical-only calibrations with grz up to redshifts of 0.85 (with multiple iterations over the full footprint) but now I am trying to push to higher redshifts by adding the w1 band.

The configuration described below extends up to a redshift of 1.2 and although sub-optimal, I have seen some promising cluster candidates at z > 0.9. At the moment the redshift distribution of the resulting cluster sample is not smooth at for z > 0.9 and some redshift bins at high-z look very incomplete compared to others.

*I suspect that the problem is most likely related to my colour mode settings (especially `_maxnodes`)**. Please let me know if you spot any settings which could be tweaked to improve performance.

Basic set up

refmag: z
zrange: [0.05, 1.20]
bands: ['g', 'r', 'z', 'w1']
mstar_survey: des
mstar_band: z03

Training set

Healsparse maps

## Colour modes - Use all colours. - `calib_colormem_colormodes: [0, 1, 2]` - Cut spectroscopic training at zbounds just before the colour models start to become flat. - `calib_colormem_zbounds: [0.35, 0.75]` - Set maxnodes to the redshifts where the colour scatter starts to blow up due to being shallow/blue - `calib_color_maxnodes: [0.6, 0.85, -1]` - `calib_covmat_maxnodes: [0.6, 0.85, -1]` I think I need to tweak the settings above as the **sigma for r-z and z-w1 colours reduces to zero before the redshift limit** of the calibration in the plots below. In particular the **z-w1 sigma is not smooth**. This results in a **strange redshift distribution in the cluster sample for z > 0.9**. I suspect this is partially due to the above settings as well as the lack of spectroscopic galaxies and the accuracy of the initial red sequence models at very high redshifts. Given these diagnostic plots **What values of `calib_covmat_maxnodes` and `calib_covmat_maxnodes` do you recommend**?

## calib_colormem_sigint `calib_colormem_sigint: [0.05, 0.03, 0.06]`. This does not actually appear to be used in the code, is that correct? # Zreds The iter1 zreds look ok for z < 0.9 but it is far from perfect. It is asymentic about ztrue=zred and the apparent gap at 1.0 < z < 1.1 is slightly worrying.

## wcen_cal_zrange - I set the upper limit to be the maximum of z_range where scaleval=1. - ```wcen_cal_zrange: [0.05, 0.60]```. - **Do I need to increase the upper limit to > 0.6 to have accurate centring at high-z?** ## limmag_catalog - `limmag_catalog: 24.0`. - 0.2L* = m*(z=1.2) + 1.75 = 23.75. **Do you think I should go deeper?** - I override `limmag_hard` in the master catalogue table with `limmag_catalog` in the config file. I have confirmed this works at each step of the calibration. ## Initial red sequence models - My initial `ezgal` red-sequence models look very similar to the default DES `redmapper` files for g-r and r-z , however they start to deviate at very high-z. This comparison is shown in the figure below. - I created a z-w1 model. **Do you think the level of accuracy is ok at high-z?** I assume that it should not matter given the `*_maxmodes` settings above. My w1-w2 model that I also created (not shown here) is even more identical to yours (created in #58) at all redshifts.

## Background setting - `calib_make_full_bkg: False` - For speed this has been turned off (for now). ## Minimum richness for computing z_lambda correction. - `calib_zlambda_minlambda: 7.0` - Default is 20.0 but as `redmapper` complains that there are some bins without enough spectra. - This is likely to be caused by an underlying problem.

As soon as I have the right settings I will increase the size of the training footprint and increase the number of calibration iterations. I apologise for such a long report but thought it would be better to be verbose in order to speed up troubleshooting. If you have any advice for me whatsoever I would be extremely grateful. Thanks again for all your hard work. I really appreciate it! Cheers, Jacob
erykoff commented 3 years ago

Thank you for your detailed report. This is interesting, and very helpful. Hopefully some of these suggestions will prove to be useful. These answers follow your questions, and are not in order of importance.

  1. For the depth map, the raw depth map is using "reddened" magnitudes, but the depth map should be corrected for reddening to match the galaxy catalog. I doubt that higher resolution will make much difference. And fixing the reddening of the depth maps won't make a huge difference at high z because we're looking in the NIR where there is less reddening.
  2. Colormodes looks good. zbounds looks fine. Setting calib_color_maxnodes might not be necessary; this says that the calibration should fix the mean g-r color for all higher redshifts at the value at 0.6. This obviously isn't correct, you still have signal on the mean g-r color at z>0.6. However, I am confused about your diagnostic plots, since they extend to z=1.2? Or was this run without those settings? On the other hand, setting calib_covmat_maxnodes to something slightly below the redshift where it collapses would make sense. This will fix the scatter to these values at higher redshift. But they might not be too important, because the photometric errors are getting large enough for the intrinsic scatter not to matter much. At the same time, given the wiggliness of the sigma plots, I would recommend setting calib_covmat_nodesize to something larger than the default 0.15. This will hurt a little at the filter transitions, but will smooth things out. You could also consider increasing calib_slope_nodesizes because the slope is looking a bit jumpy, especially in z-W1. The default calib_color_nodesizes looks fine, but you could also try increasing this, but I don't think it's a problem. The calib_colormem_sigint I thought was being used as the first guess for the intrinsic width, but apparently I stopped using that. Huh!
  3. The features/outliers for zred are normal for the first iteration, this is a selection based on a guess of the color of the clusters. This will smooth itself out with further iterations. However, the lack of any galaxies at higher z means that it really did fail to grab any high z galaxies, which is bad. The red training galaxy plots look fine as a function of redshift, it may just be that the node sizes need to be adjusted to keep the fit from getting wonky at higher z. Another parameter to look at is calib_corr_pcut which is the membership probability cut of galaxies going into these diagnostics and the correction plots. If, due to photometric noise or some other reason, the probabilities are peaked at something below 0.9 then they're going to be missing from this plot and the rest of the calibration can go wonky.
  4. The initial red sequence model is probably fine, especially z-W1 which is tracking the spec galaxies fine.
  5. I'm not surprised you have to change calib_zlambda_minlambda. You can probably go down to 5 okay. And further iterations will really cut the outliers and the wiggles.

So what I would recommend is to play with the settings above until you're satisfied with (a) the zred plots (that they have what looks like a reasonable number of galaxies selected), and (b) the z_lambda plots (that they have a reasonable number of clusters to high z). Probably increasing the node size will help. And you shouldn't worry about outliers at this point, it's a problem if they persist into the second iteration which makes a big difference in the selection/modeling.

jacobic commented 3 years ago

Hi Eli,

Thanks so much for your detailed response. I learned a lot while implementing your recommendations and think I am definitely a few steps closer to achieving an accurate high-z calibration thanks to your help :)

I tried almost everything that you suggested (detailed at the end of this message) but think this is the cause of the problem is the following:

Proposed solution

These are the default cuts

https://github.com/erykoff/redmapper/blob/0100eaf0a445c4eedcbb569b5bcc58c8ba47ac50/redmapper/configuration.py#L293-L294

https://github.com/erykoff/redmapper/blob/0100eaf0a445c4eedcbb569b5bcc58c8ba47ac50/redmapper/configuration.py#L264

and the pcut is applied to pcol by default to define use and the pcol cut is used to define coluse

https://github.com/erykoff/redmapper/blob/0100eaf0a445c4eedcbb569b5bcc58c8ba47ac50/redmapper/calibration/redsequencecal.py#L66-L81

This means that the galaxies used to to correct zreds will only be cut with have pcol > 0.3 as gals depends on use.

https://github.com/erykoff/redmapper/blob/0100eaf0a445c4eedcbb569b5bcc58c8ba47ac50/redmapper/calibration/redsequencecal.py#L204-L205

because in the function above there is only minor outlier clipping and no further usage of any of the pcol / p cuts

https://github.com/erykoff/redmapper/blob/0100eaf0a445c4eedcbb569b5bcc58c8ba47ac50/redmapper/calibration/redsequencecal.py#L869-L871

https://github.com/erykoff/redmapper/blob/0100eaf0a445c4eedcbb569b5bcc58c8ba47ac50/redmapper/calibration/redsequencecal.py#L920-L927

The calib pcorr value is actually never used in the code (apart from in one redmagic script)

https://github.com/erykoff/redmapper/blob/0100eaf0a445c4eedcbb569b5bcc58c8ba47ac50/redmapper/configuration.py#L280

but in the plotting routine there is a 0.9 hardcoded

https://github.com/erykoff/redmapper/blob/0100eaf0a445c4eedcbb569b5bcc58c8ba47ac50/redmapper/calibration/redsequencecal.py#L1002

Explanation

Using a pcut of 0.3 rather than 0.7 or 0.9 will make a big difference at high-z because it is where the training set is most contaminated.

Since I opened this issue I switched to using a spectroscopic training set which samples the top 2000 brighest cluster galaxies from the literature in each redshift bin of width 0.05 across the whole redshift range.

This means there is more than enough signal and redmapper does not complain at all.

As you can see from the histogram of the training set, at around z~0.8 completeness starts to drop off and we have < 2000 cluster galaxies per dz=0.05. This is where contamination starts to become a problem. This is what I believe causes the bias at high-z since the contaminating galaxies have low pcol yet are still included in the zred correction because of the potential bug described above.

This figure below shows what the iter_0 color mem file looks like at for lambda > 10 for z-w1 at high redshift.

You can see that although there are many galaxies with high pcol, there are also a lot with pcol < 0.7 (and it actually turns over at 0.65).

To avoid these low probability galaxies being used in the zred correction and redsequence fitting I propose to force the pcol cut / p cut / p calib cut to something higher than 0.3 in the code. Hopefully this will solve the bias in zred and zlambda at high-z.

Please let me know what you think about my proposed solution and thanks again for such useful feedback!

Cheers, Jacob

P.S. Other things that I updated while trying to solve the problem:

jacobic commented 3 years ago

Hi Eli,

Here is a quick update with further debugging.

At first glance, modifying the cuts does not seem to improve things on the first iteration and more conservative cuts do not result in better zred at high-z. Perhaps it will make a bigger difference during the second integration (or later on in the first calibration)... or perhaps I am fundamentally misunderstanding something in my previous comment.

Experimenting with cuts

calib_color_pcut: 0.7, calib_pcut: 0.3 (default settings)

calib_color_pcut: 0.7, calib_pcut: 0.7

calib_color_pcut: 0.9, calib_pcut: 0.9

New solution?

I still suspect that the bias at high-z is related to the zred / zlambda corrections. The plot below uses your newly pushed zscan code and my "best" calibration so far (1 iter, maxnodes=-1 covmat_nodesize=0.2, improved training clusters etc.) to check the calibration performance. As you can see there is a build up of high-z clusters in a zscan for the SPT Clusters from the 2500deg^2 Bocquet+19 sample below. This is relatively unbiased as a function of z_lambda but very biased as a function of zlit (the literature redshift in this case).

As the build up of the clusters at around z=0.85 I thought it could be because the zbounds is too high (currently 0.8) between r-z and z-w1. This means there is a huge peak in the distribution of clusters at this transition point because the colour models for r-z and z-w1 are both relatively flat at z=0.8.

It also looks strongly correlated to scaleval which made me think that the extinction bug in the depth map (which is defined in the z-band) could be causing this problem but since scaleval is related to redshift it is difficult to say.

Please let me know if you have any ideas and thanks for pushing the zscan code, it works like a charm!

Cheers, Jacob

jacobic commented 3 years ago

Hi Eli,

One last update from me before the weekend (sorry for so many comments).

None of these things improved things much...

It could be simply that the colours and uncertainties from the 4year WISE forced photometry are not sufficiently accurate or deep enough when processed in the Legacy Imaging Surveys so when the transition to z-W1 is made, things start to go pear shaped.

To test this theory out I am matching CATWISE 2020 galaxies to the grz galaxies in Legacy DR8 which will hopefully result in more reliable colour information at high-z due to the increase in depth and the fact the photometry is not forced. This should make it easier to interpret the zred plots.

Have a great weekend!

Cheers, Jacob

erykoff commented 3 years ago

So I don't have any quick answers, and I'll have to look at the use of the different p cuts to make sure that things are doing what they're supposed to be doing and documented at least adequately. But I wanted to point out that the zscan mode is not magic; it's only as good as the red sequence model that's put in. So if the red sequence model is not converged properly, then zscan will end up with a pileup as you see.

One thing to look at is not the extinction variation (which isn't going to be that large) but the overall depth. Are you using z or W1 as the reference band? And how deep is the catalog in these bands, and how deep is it in terms of L at z=0.8, 1.0, 1.2? Because if you're just reaching the tip of the luminosity function, things won't work as well. One thing that you can try, though, to normalize things is to change the reference luminosity cut from the default 0.2L to something brighter (0.4 or 0.5L*) and see if that reduces problems where I think you might be hitting the filter transition and the z-band depth limit at the same time.

Another thing to look at is the actual errors in the photometric catalog. If these are largely over- or under- estimated that can lead to problems.