jacobic commented 3 years ago

Hi Eli,

I am trying to reproduce the DES Y1 redmapper catalogue as a way of validating the performance of my pipeline:

http://desdr-server.ncsa.illinois.edu/despublic/y1a1_files/redmapper/redmapper_y1a1_public_v6.4_catalog.fits.gz http://desdr-server.ncsa.illinois.edu/despublic/y1a1_files/redmapper/redmapper_y1a1_public_v6.4_members.fits.gz

However, I am finding that the richness of the optically selected clusters that I generate are consistently low relative to the public optically selected catalogues. On average lambda is about 20% smaller than the public values, although the redshifts have a good agreement. Looking at the magnitude distribution of the members of the matched clusters, it suggests that a significantly lower fraction of member galaxies are selected at all magnitudes in all bands. I have repeated the comparisons after each iteration of the calibration (1, 2 and 3) as well as for scanning mode runs on the optical centres of the public catalogues to see if that would makes a difference.

This could be for a number of reasons e.g.

configuration settings
treatment of the photometry (creation of the pixelised galaxy catalogues)
the training dataset
differences in code versions
differences in the depth maps and/or mask files

I have tried to eliminate as many of sources of discrepancy as possible however I am still finding a small differences in the richness value, which to me suggests that either there is a small (but fundamental) misunderstanding from my side, or that my pipeline likely has some room for improvement. I have created this issue in hope that you can point me in the right direction.

Photometry

Below is the details of the calibration and a summary of the tests that I have carried out:

I am using the DES Y1 MOF catalogue

http://desdr-server.ncsa.illinois.edu/despublic/y1a1_files/mof_catalogs/y1a1-gold-mof-badregion.fits

As well as the latest version of redmapper (which passes all unit tests)

Here is a summary of the way I pixelise the photometry and map the data to the redmapper dtype. I also clean the catalogue before it is pixelised in a way which resembles that is described in the papers using 'and '.join(exprs_clean). I also remove any sources with nan mags or errs or very unphysical values for galaxies (mag > 27 or mag_err > 1). Perhaps I am over cleaning the data?

# info for redmapper
survey_mode: des
bands: ['g', 'r', 'i', 'z']
nmag: 4
ref_ind: 3
ref_band: z
limmag_ref: 24.5
zeropoint: 30.0

#id and sky position
id: bigint(coadd_objects_id)
ra: double(ra)
dec: double(dec)

#magnitude info
refmag: 'float(%zeropoint% - 2.5*log10(mof_flux_%ref_band%))'
refmag_err: 'float(abs((-2.5 / ln(10)) * mof_fluxerr_%ref_band% / mof_flux_%ref_band%))'
mag: 'transform(array(mof_flux_g, mof_flux_r, mof_flux_i, mof_flux_z),  x -> float(%zeropoint% - 2.5*log10(x)))'
mag_err: 'zip_with(array(mof_flux_g, mof_flux_r, mof_flux_i, mof_flux_z), array(mof_fluxerr_g, mof_fluxerr_r, mof_fluxerr_i, mof_fluxerr_z), (x, y) -> float(abs((-2.5 / ln(10)) * y / x)))'

# how the catalogue is cleaned (combining these expressions with the and operator)
exprs_clean: ['modest_class == 1', 'flags_gold == 0', '(flags_badregion & ( 2 | 4 | 8 | 16 | 32 | 256)) == 0']

HealSparse Depth maps

I created depth maps at nside 4096 using your model (taken from the code with redmapper) and I expand to larger pixels when if the fit fails.

des_y1_mof_griz z depth 4096 2048 exptime $des_y1_mof_griz z depth 4096 2048 fracgood$

Comparing these depth maps with the public depth maps show a good agreement which with some scatter due to the fact that I have not recovered the depth maps using the recommended procedure in Rykoff+15, I have simply used the error model and expanded to larger pixels as necessary. Here the notation is 'rm' corresponds to the public version and 'em' corresponds to the reproduced version.

rm_em_depth_map_comparison1_des_y1_mof rm_em_depth_map_comparison2_des_y1_mof

Do you think this scatter or the way the depth map is created could cause cause lambda to biased 20% lower than expected? Perhaps I should not attempt to use a map which is too high resolution since at nside=2048 the maps are much smoother and have almost enough galaxies per pixel

HealSparse Mask

The healsparse mask I create is from a combination of the footprint and badmask files and I just use fracgood = 1 or 0 so some information is lost

hp_fp = hp.read_map('y1a1_gold_1.0.2_wide_footprint_4096.fits.gz', nest=True)
hp_bad = hp.read_map('y1a1_gold_1.0.3_wide_badmask_4096.fits.gz', nest=True)

#use same masking expression used to clean the catalogue
idx_mask = np.logical_and(hp_bad, ( 2 | 4 | 8 | 16 | 32 | 256)) == 0
hp_fp[np.logical_not(idx_mask)] = hp.UNSEEN
hp_fp[hp_fp <= 0] = hp.UNSEEN
hp_fp[hp_fp > 0] = 1

hsp_map = hsp.HealSparseMap(healpix_map=hp_fp, nside_coverage=2048, nest=True)
hsp_map.write('y1a1_gold.4096.footprint.badmask.fracgood.hsp.fit', clobber=True)

y1a1_gold_1 0 2_wide_footprint_4096 fits y1a1_gold_1 0 3_wide_badmask_4096 fits $y1a1_gold footprint badmask fracgood hsp$

Calibration

With the pixelised galaxy catalogues in hand I then run the calibration over the entire footprint

bkg_deepmode: True - I assumed this would make it as accurate as possible
specfile_train: /u/jacobic/redpipes/data/interim/training/spiders_dr16_archive.fits which corresponds to all the confirmed spectroscopic members available in the literature (e.g. for the members in the redmapper catalogue if p > 0.9 and they have a spectroscopic redshift they are added to the training set). Could this cause overfitting and therefore low richness values?

The full cal.yml file:

# This is an example calibration file, based on one used to calibrate SDSS DR8
# Please note that these settings are appropriate for a ~10000 deg^2 survey with
# ugriz bands, focused on lower redshift (z<0.6) clusters.
#
bands: ['g', 'r', 'i', 'z']

limmag_ref: 25.0
limmag_catalog: 25.0

# Galaxy file for input
galfile: /ptmp/jacobic/redpipes/data/interim/des_y1_mof_griz_z/des_y1_mof_griz_z.True.32.16._master_table.fit

# Path to put plots in
plotpath: 'plots'

# Galaxy catalog has truth information (as with simulated catalogs)
has_truth: False

# Healpixel over which to run calibration
# Set "hpix" and "nside" to 0 to run the full footprint
#hpix: 0
# Nside of this healpix
#nside: 0
# Should be set to 0 for calibration
border: 0.0

# Reference magnitude (band) name
# Recommend z-band for cluster-finding to higher redshift z>0.8)
refmag: z

# Redshift range [lo, hi]
zrange: [0.20, 0.90]

# Spectroscopic input catalog
specfile_train: /u/jacobic/redpipes/data/interim/training/spiders_dr16_archive.fits
specfile: /u/jacobic/redpipes/data/interim/training/spiders_dr16_archive.fits

# All files will start with this name
# If the name ends with "cal" it will be replaced with "run" for the runs
outbase: des_y1_mof_griz_z_cal

# Maximum chi-squared to consider a possible member.  Default is 20.0
chisq_max: 20.0

# L* threshold for computing richness.  Default is optimal 0.2
lval_reference: 0.2

# Name of the "survey" for the m*(z) file
# (see README.md for details)
mstar_survey: des
# Name of the "band" for the m*(z) file
mstar_band: z03

# Maximum weighted coverage area of a cluster to be masked before it
# is removed from the catalog.  Default is 0.2
max_maskfrac: 0.2

# Number of sample galaxies to compute mask region.  Default is 6000
maskgal_ngals: 6000
# Number of different samples of maskgal_ngals (reduces biases at high redshift)
maskgal_nsamples: 100
# Filetype mode for the geometric mask.  Default is 0
# Mode 3 is a healpix mask.  (only mode supported)
mask_mode: 3
# Name of the mask file.  See README.md for format.  Default is None
#maskfile: /u/jacobic/redpipes/data/external/des_sva/sva1_gold_r1.0_goodregions_04_n4096.hsp.fit
maskfile: /u/jacobic/redpipes/data/config/cal/des_y1_mof_griz_z_0.20_0.90_v0.5/y1a1_gold.4096.footprint.badmask.fracgood.hsp.fit

# Name of the depth file.  See README.md for format.  Default is None
depthfile: /u/jacobic/redpipes/data/config/cal/des_y1_mof_griz_z_0.20_0.90_v0.5/des_y1_mof_griz.z.depth.4096.2048.fit

# chi-squared binsize for background.  Default is 0.5
bkg_chisqbinsize: 0.5
# reference mag binsize for background.  Default is 0.2
bkg_refmagbinsize: 0.2
# redshift binsize for background.  Default is 0.02
bkg_zbinsize: 0.02
# redshift binsize for zred background.  Default is 0.01
bkg_zredbinsize: 0.01
# Compute background down to the magnitude limit?  Default is False
# This will be useful in the future when computing membership probabilities
# for non-members (not supported yet).  If turned on, background
# computation is slower (depending on depth).
bkg_deepmode: True

# Name of centering class for calibration first iteration.  Default is CenteringBCG
firstpass_centerclass: CenteringBCG
# Name of centering class for cluster runs.  Default is CenteringWcenZred
centerclass: CenteringWcenZred

# Number of iterations for calibration.  Default is 3
calib_niter: 3
# Number of cores on local machine for calibration (zred, background)
# Default is 1
calib_nproc: 32
# Number of cores on local machine for calibration cluster finder runs
# Default is 1
calib_run_nproc: 32

# Nsig for consistency with red sequence to be used in red-sequence calibration.  Default is 1.5
#  Make this too wide, and blue galaxies contaminate the red sequence width computation.
#  Make this too narrow, and there is not enough signal.  Empirically, 1.5 works well.
calib_color_nsig: 1.5
# Nsig for consistence with red sequence to be used as a training seed galaxy.  Default is 2.0
calib_redspec_nsig: 2.0

# Red-sequence color template file for initial guess of red sequence.
# This file does not need a full path if it is in the data/initcolors path in the package
# which will be searched first.  If it is a file not in the redmapper package, this
# should be a full path.
calib_redgal_template: bc03_colors_des.fit
# Redshift spline node spacing for red-sequence pivot magnitude
# Default is 0.1
calib_pivotmag_nodesize: 0.1
# Redshift spline node spacing for each color.  Must be array of length nmag - 1
# Recommended default is 0.05 for each color
calib_color_nodesizes: [0.05, 0.05, 0.05]
# Redshift spline node spacing for each slope.  Must be array of length nmag - 1
# Recommended default is 0.1 for each color slope
calib_slope_nodesizes: [0.1, 0.1, 0.1]
# Maximum redshift for spline to use in fit.
# Recommended is -1 (max redshift) for each color, unless a band is very shallow/blue
# (for SDSS in this sample, only consider u-g in detail at z<0.4)
calib_color_maxnodes: [-1, -1, -1]
#calib_color_minnodes: [-1, -1, -1]
# Maximum redshift for covariance spline to use in fit.
# Recommended is -1 (max redshift) for each color, unless a band is very shallow/blue
# (For SDSS in this sample, only consider u-g in detail in covmat at z<0.4)
calib_covmat_maxnodes: [-1, -1, -1]
#calib_covmat_minnodes: [-1, -1, -1]
# Redshift spline node spacing for covariance matrix.  Default is 0.15
calib_covmat_nodesize: 0.15
# Redshift spline node spacing for zred corrections.  Default is 0.05
calib_corr_nodesize: 0.05
# Redshift spline node spacing for zred slope corrections.  Default is 0.1
calib_corr_slope_nodesize: 0.1

# Use "pcol" (which excludes radial weight) in selecting galaxies to calibrate the
#  red sequence.  Default is True, which is strongly recommended.
calib_use_pcol: True
# Membership probability cut to compute zred corrections.  Default is 0.9
calib_corr_pcut: 0.9
# Membership probability cut for pivotmag and median color computation.  Default is 0.7
calib_color_pcut: 0.7

# Membership probability cut for use in red-sequence calibration.  Default is 0.3
calib_pcut: 0.3
# Minimum richness for a cluster to be considered a calibration cluster.  Default is 5.0
calib_minlambda: 5.0
# Smoothing kernel on redshifts from calibration clusters.  Default is 0.003
calib_smooth: 0.003

# Luminosity function filter alpha parameter.
calib_lumfunc_alpha: -1.0

# Zeroth iteration color training parameters

# Scale radius of radius/richness relation (r_lambda = r0 * (lambda/100)^beta)
# Default is 0.5 (h^-1 Mpc)
calib_colormem_r0: 0.5
# Power-law slope of radius/richness relation
# Default is 0.0
calib_colormem_beta: 0.0
# Smoothing kernel on redshifts from color calib clusters.  Default is 0.003
calib_colormem_smooth: 0.003
# Minimum richness to be used as a color-training cluster.  Default is 10.0
calib_colormem_minlambda: 10.0
# Color indices for training, redshift bounds, and assumed intrinsic scatter.

# Note that if you have griz and not ugriz this should be [0, 1, 2]
calib_colormem_colormodes: [0, 1, 2]
calib_colormem_zbounds: [0.35, 0.70]
calib_colormem_sigint: [0.05, 0.03, 0.03] #TODO: this is unused

# Cluster photo-z z_lambda parameters

# Redshift spline spacing for correcting cluster z_lambda.  Default is 0.04
calib_zlambda_nodesize: 0.06
# Redshift spline spacing for richness slope correction of cluster z_lambda.
# Default is 0.1
calib_zlambda_slope_nodesize: 0.1
# Minimum richness for computing z_lambda correction.  Default is 20.0
calib_zlambda_minlambda: 20.0
# Number of z_lambda_e sigma an outlier can be to be included in correction
# Default is 5.0
calib_zlambda_clean_nsig: 5.0
# Number of iterations for z_lambda correction algorithm.  Default is 3
calib_zlambda_correct_niter: 3

# Pivot richness for richness correction.  Default is 30.0
zlambda_pivot: 30.0
# Bin size for interpolating z_lambda correction.  Default is 0.002
zlambda_binsize: 0.002
# Tolerance for convergence when computing z_lambda.  Default is 0.0002
zlambda_tol: 0.0002
# Maximum number of iterations to converge to z_lambda.  Default is 20
zlambda_maxiter: 20
# Fraction of highest probability members to use to compute z_lambda.  Default is 0.7
zlambda_topfrac: 0.7
# Step size to fit a parabola to peak of z_lambda likelihood.  Default is 0.002
zlambda_parab_step: 0.002
# Epsilon size to compute change in richness as a function of redshift.  Default is 0.005
zlambda_epsilon: 0.005

# Centering parameters

# Pivot richness for wcen model.  Default is 30.0
wcen_pivot: 30.0
# Minumum richness to use in calibrating wcen model.  Default is 10.0
wcen_minlambda: 10.0
# Maximum richness to use in calibrating wcen model.  Default is 100.0
wcen_maxlambda: 100.0
# Softening radius (h^-1 Mpc) in computing wcen.  Default is 0.05
wcen_rsoft: 0.05
# Richness range for calibrating wcen model.
#  This should be within the volume-limited range of the catalog.
wcen_cal_zrange: [0.2, 0.7]

# Cluster finder: first pass

# r0 in radius-richness relation (see above).  Default is 0.5 (h^-1 Mpc)
firstpass_r0: 0.5
# beta in radius-richness relation.  Default is 0.0
firstpass_beta: 0.0
# Number of iterations in first pass.  Default is 2
firstpass_niter: 2
# Minimum richness to pass cluster candidate to next step.  Default is 3.0
firstpass_minlambda: 3.0

# Cluster finder: Likelihood pass

# r0 is radius-richness relation (see above).  Default is 1.0 (h^-1 Mpc)
#  Note that these should typically be the same as used in percolation
likelihoods_r0: 1.0
# beta in radius-richness relation.  Default is 0.2
likelihoods_beta: 0.2
# Should likelihood use the zred in computing likelihood.  Default is True
likelihoods_use_zred: True
# Minimum richness to pass cluster candidate to next step.  Default is 3.0
likelihoods_minlambda: 3.0

# Cluster finder: Percolation pass

# r0 is radius-richness relation (see above).  Default is 1.0 (h^-1 Mpc)
#  Note that these should typically be the same as used in likelihood
percolation_r0: 1.0
# beta in radius-richness relation.  Default is 0.2
percolation_beta: 0.2
# rmask_0 in rmask = rmask_0 * (lambda / 100) ^ rmask_beta * (z_lambda / rmask_zpivot)^rmask_gamma
#  relation.  This sets the radius to which member galaxies are masked in percolation.
#  This should be >= percolation_r0.  Default is 1.5
percolation_rmask_0: 1.5
# beta in rmask relation
percolation_rmask_beta: 0.2
# gamma in rmask relation
percolation_rmask_gamma: 0.0
# zpivot in rmask relation
percolation_rmask_zpivot: 0.3
# Mask galaxies down to this luminosity cut (fainter than richness computation).  Default is 0.1 (L*)
percolation_lmask: 0.1
# Number of iterations to compute richness/redshift in percoltation.  Default is 2
percolation_niter: 2
# Minimum richness to save cluster.  Default is 3.0
percolation_minlambda: 3.0
# Minimum probability of being a bcg for a galaxy to be considered a center.  Default is 0.5
percolation_pbcg_cut: 0.5
# Maximum number of possible centrals to record.  Default is 5
percolation_maxcen: 5

zscan_testfile: /u/jacobic/redpipes/data/external/redmapper/redmapper_y1a1_public_v6.4_catalog.fits

des_y1_mof_griz_z_cal_redgals_i-z

All other check plots that redmapper creates are contained in the zip file below for iterations 1-3

des_y1_mof_griz_z_cal_iter1-3_check_plots.zip

I did not spot anything unusual in the plots, the redshift bias and scatter + zred looks ok to me but perhaps are able to spot something fishy?

Tests

I did a separate optical cluster finding run based on iter 1 -3 and repeated each of the runs in scanning mode centred on the public optical catalogue as a catfile. I then crossmatched public and reproduced catalogues optical centres with a search radius of 0.05 deg.

The config file for each iteration is below: run_iter1-3.yml.zip

Number of matched clusters with 0.05 deg

The number of matches between the reproduced and public catalogues differ as function of the number of iterations and are sensitive to the run mode. Here is a quick summary of the number of matches within a search radius of 0.05 deg.

Optical only <- plots below are based on these clusters

Iter 1 4188

Iter2 4364

Iter 3 4352

Scanning mode on public centres

Iter1 5601

Iter2 5630

Iter3 5631

Photometric redshift discrepancy

There is a photometric redshift discrepancy which seems to get less smooth with more iterations (please not these are separate plots so there is not a common normalisation)

iter1 test_cat_iter1_z_lambda_bias iter2 test_cat_iter2_z_lambda_bias iter3 test_cat_iter3_z_lambda_bias

Richness discrepancy

The richness is more significantly different than the redshifts for the matched clusters

iter1 test_cat_iter1_lambda_ratio iter2 test_cat_iter2_lambda_ratio Iter3 test_cat_iter3_lambda_ratio

This can be explained by the magnitude distribution of the members of the matched clusters where we see less members at all magnitudes

iter1 test_iter1_mem_mag_auto iter2 test_iter2_mem_mag_auto iter3 test_iter3_mem_mag_auto

I thought this could perhaps be due to differences in the red-sequence, so I matched both public and reproduced member catalogues to the same specfile (all publicly available spectra) and the distribution in the redshift-colour plane is very similar (note that each figure a common normalisation between subfigures but not between the figures corresponding to other iterations). Left column is the reproduced catalogue and the right column is the public catalogue.

iter1 test_iter1_mem_redsequence iter2 test_iter2_mem_redsequence iter3 test_iter3_mem_redsequence

The zscan versions of all these plots look very similar to the optically selected ones shown above so I have not uploaded them, however I do get a lot more matches within 0.05 deg of the public clusters in zscan mode (presumably because they are already centred on probable cg's)

Summary

Please let me know if you have any advice about how to optimise this redmapper run on DES Y1. Of the following which do you think is the most likely to cause the discrepancy shown in the tests above?

configuration settings
treatment of the photometry (creation of the pixelised galaxy catalogues)
the training dataset
differences in code versions - did you use the IDL version for instance?
differences in the depth maps and/or mask files

I also compared DES Y1 /SPT clusters to Legacy Imaging DR8 redmapper clusters and they were biased low in richness by about the same about. I also did the same thing for DES-SVA reproduced vs DES SVA public redmapper catalogues and it low richness was also apparent. This makes me think it is not related to the treatment of the photometry or the masks since these things differ between surveys, however the depth map creation, configuration settings and training set are all very similar between the different runs i've tried so perhaps they are more likely to be a cause of the issue.

Once this catalogue is in close agreement with your public version I think the catalogues made via my pipeline will hopefully be more complete with lower-scatter in richness and more members per cluster which is essential for spectroscopic targeting.

Thanks for your amazing work on redmapper and sorry for such a long issue report

Have a nice weekend!

Jacob

Hoptune commented 3 years ago

Hi Jacob,

Just to add a bit more information. I was trying to do the same thing using SDSS DR8 data and experienced exactly the same thing when comparing with the official catalog. After carefully comparing mine result with the official catalog, I think it is due to current algorithm producing a narrower calibrated red sequence. I found that low lambda is connected with generally low $p_mem$ of almost all member galaxies, and that lower $p_mem$ is caused by lower $p_color$ since radial filter and luminosity filter are pretty robust.

I also compare the catalog produced by current redmapper with some spectroscopic data and found the current version can generally reduce outlier fraction. Hope this helps.

Best, Zhiwei

jacobic commented 3 years ago

Hi @Hoptune,

Thanks for your comment. I think that your experience with the SDSS catalogue might be slightly different though as when I look at the difference in the probability of membership for the members of the matched clusters they look similar, however, it appears that there are simply less members per cluster (which is consistent with the magnitude distributions above):

Iter1 test_mem_iter1_p_hist test_mem_iter1_p_diff_hist

Iter2

test_mem_iter2_p_hist test_mem_iter2_p_diff_hist

Iter3

test_mem_iter3_p_hist test_mem_iter3_p_diff_hist

Cheers, Jacob

erykoff commented 3 years ago

@jacobic Thank you for this comprehensive comparison report. A lot more than usually shows up in a github issue!

Anyway, @Hoptune is right that there are systematic differences here between the code that was run on DES Y1 (and SDSS DR8) and what is in this repo. All the fully published redmapper catalogs, actually, are from the old non-public IDL code, which is finicky to use (aside from being based on a proprietary non-free language, etc.)

There are a number of differences and (I believe) improvements from the old IDL code to the new Python code in this repo. But the one change that is most relevant to the systematic richness differences is in some assumptions about the red-sequence model. In the IDL code (and earlier versions of the Python code) the correlation coefficient in (intrinsic) residuals between different colors in the red-sequence model was a free parameter. It turns out this was not very stable. Furthermore, tests on sims (and even looking at spec data) shows that a galaxy that is intrinsically redder than the red sequence in one color is also going to be redder in other colors. That is, if the r-i color of a red galaxy is slightly redder than the mean color at a given redshift (after accounting for the color-magnitude slope) then it will also be redder in g-r and i-z. So the current code fixes these correlation coefficients at 0.9 (default) rather than fitting as a function of redshift. In practice, this tends to reduce outliers as well as reduce the richness.

There are other changes to the code as improvements have been made. So the IDL results are not going to be completely replicatable with the Python code, unfortunately.

I hope this helps!

erykoff / redmapper

lambda is biased low when trying to reproduce the DES Y1 redmapper run #78