Do sanity check of CatSim galaxy catalog.

cwwalter commented 8 years ago

Anže will do sanity check on a CatSim based galaxy catalog of 80 sq degrees to see if the angular clustering is reasonable . The catalog is created by @danielsf as outlined in #3.

slosar commented 8 years ago

Have a look at the QA notebook done on the full catalog (github nicely opens it)

https://github.com/DarkEnergyScienceCollaboration/SSim_DC1_Roadmap/blob/master/QA/CatSim_sanity.ipynb

Comments:

We need to stop moving files this big as ASCII. It is just idiotic. numpy.loadtxt crashes my computer with 16Gb of memory. I can load if I loop manually over lines and then I downsample by x4 to keep things manageable. Just need to use HDF5 for datasets of this size.
Total object count is now reasonable but not perfect, implying 2.9 billion in gold sample for 20k sq degree. Which is 75% of what I was expecting
redshift histogram looks sane
mag historgrams look sane. One could see that a sharp cut-off was applied to rmag propagating into not-so-sharp cutoffs in others. We need to recheck this at the output where all histograms should be naturally cut-off by the noise
radial distance distribution from circle "boresight" looks good
In projected maps, just above correlation function, for signal, you can clearly see the repeating pattern of the tiled sim. Start by looking at four dots and then it becomes easy to see how everything is replicated. But we expected this.
Correlation function looks worse then I expected. Maybe there is a bug in my code. I note:
- x4 undersampling should not affect this, we are deep in the sample variance regime
- the trend with redshift is about right
- the small scale behaviour is about right power law with index ~ -1.8
- we seem to get into the noise very quickly, already at 20-30 Mpc. I would expect this to be better. Millenium sim is 500Mpc at side.
- CMASS galaxies have xi*r^2 ~50 Mpc^2 at 50Mpc, assuming bias ratio of around 2, you would expect value here to be around 10, in fact it is about one. So, galaxies are not quite clustered enough, which is consistent with overly noisy estimate of corr func.
- the 80sq deg is a weird cone, but it is >100Mpc wide for most of the survey and very long. We should be able to see radial BAO, I would have thought.

So, someone should comment on whether correlation function is worrying or is it just a millenium sim limitation. Or I made a silly mistake.

cwwalter commented 8 years ago

Mentioning @egawiser and @danielsf since they are not watching the repo and might not see this issue.

cwwalter commented 8 years ago

Also letting @dkirkby know about this thread..

dkirkby commented 8 years ago

It looks like xi*r^2 is closer to -1 at 50Mpc in the CMASS redshift range (0.5-1). What does the xi*r^2 plot look like if you extend it beyond the BAO scale?

slosar commented 8 years ago

Pure noise. But note that at 200Mpc, you are approaching millenium box size and so the integral constraint is going to start to affect you etc. Maybe you could also throw an eye to see if there is an obvious bug in how i calculate xi.

dkirkby commented 8 years ago

@fjaviersanchez is going to take a look

fjaviersanchez commented 8 years ago

So, I used TreeCorr with 25% of the sample, but I generated the random catalog differently, and read the data differently. It would be a very strange coincidence if we both made the same mistake. I am getting essentially the same results as @slosar. I also made this plot to check the magnitude and redshift distributions with 1% of the objects. Green histograms/points correspond to the gold sample and blue to the total sample: test_catsim_pairplot

My version of the notebook can be found here: https://github.com/fjaviersanchez/test_catsim/blob/master/Test_CatSim.ipynb

This is what I think it's going on:

The total object count can be lower than expected due to cosmic variance. Maybe 75% of the average in such a patch is still fine -- This can be tested if we get a different patch.
The low clustering/noise at ~50 Mpc/h can be due to sample variance since for example at z=0.25, you need pairs separated by ~4.25 deg, and the circle has a ~5 deg radius and we both took 25% of the sample -- This also can be tested if we get a different patch.
The radial BAO can be smeared-out if you use the monopole. It should be visible if you compute the correlation function along the line-of-sight -- I will try to compute this to see whether the BAO signal is visible or not.

I am also computing the correlation function with some other code but, I don't expect much of a difference since TreeCorr is well tested.

slosar commented 8 years ago

@fjaviersanchez , this is great! I think this establishes that the if it is a problem, it is in the catalog, but it is not even clear it is a problem. True, at z=0.25 you don't have very much volume, but at z=1.0 you have a box some 200Mpc wide and very long, so I would expect it to work a bit better. I suggest we move on with this but assume we won't be able to do much more than that a couple of Mpc, even assuming perfect photozs...

cwwalter commented 8 years ago

I am going to wait for two days until @slosar comes back from vacation so my email isn't 'BLACK HOLED' to finalize and ask to close him to close this, but my discussion with @fjaviersanchez leads me to believe the differences we see are due to variance and the effects of tiling at the size of the box are in the noise of the correlation function.

@fjaviersanchez can you confirm this is your conclusion?

fjaviersanchez commented 8 years ago

@cwwalter Yes, that looks correct. We are not sensitive to the repetition of the box using the correlation function (at least the monopole) given the size of the patch

cwwalter commented 8 years ago

@slosar I see evidence you are back from your summer sojourns. Please look at this thread and my proposal to close this issue.

slosar commented 8 years ago

Yes, I'm happy enough to close this issue, but let's keep in mind that correlation func is not really measurable beyond 20-30 Mpc.

LSSTDESC / SSim_DC1

Do sanity check of CatSim galaxy catalog. #4