Closed damonge closed 4 years ago
Instructions to test the new faster_shear branch:
Checkout the faster_shear
branch.
Open the Makefile. In line 16 you'll see the line:
#DEFINEFLAGS += -D_USE_NEW_LENSING
Uncommenting that line will switch on the new faster-but-approximate method of generating galaxy elipticities.
I would suggest compiling two versions of the code, with and without this flag to compare the methods (you'll need to rename them so they don't get overwritten when running make
).
Currently there are two free parameters (other than -D_USE_NEW_LENSING
) controlling the new shear method. First, dr_shear
controls the spacing, in comoving Mpc/h, between the different lensing planes used to interpolate the galaxy shears. I suggest testing something like 100 Mpc/h to begin with, and see how much larger we can make it (if at all). Secondly, an nside
parameter controls the HEALPix resolution of the shear maps. Both parameters should be included within a shear
section of the param file. This section must be included whenever any source has shear.
To begin with, I've created a folder in examples/shear_test
(pretty much a copy of cl_test
) containing the parameters of a test simulation that should be possible to run on a laptop.
I would start by running two simulations, with the old and new methods, and then comparing the outputs of both directly in terms of e1
and e2
(e.g. make a histogram of e1_new - e1_old
or something like that). If those don't look too crazy (or maybe even if they do), I'd then move on to computing the shear power spectrum of the sample. You can see how to compute these in the cl_test
folder (see for instance cl_test.py).
Note that, in order to generate two simulations with exactly the same sources in them, you need to switch off openmp and use only MPI parallelization (otherwise openmp introduces extra randomness), and you need to run both simulations using the same number of nodes. Note that you don't need to recompile the code without openmp, just run export OMP_NUM_THREADS=1
before you start working.
Ultimately we want to be able to run things at LSST scale. The type of simulation we can expect there is illustrated in the LSST example folder. Running this requires ~1TB of memory, so you better run this at NERSC. We want to check that the new method runs significantly faster than the old one. To get an idea of how long the old one actually takes, it'd be good to run a simulation of this with the old method and record the time (the output of CoLoRe already tells you this). You can do this even before you've fully tested the method, just so we have a benchmark to fight against to begin with ;-).
We've been working on this topic and found discrepancies between the results using NEW_LENSING and the "old" ones.
For example, we have the following scatter plot:
The Blue dots represent the comparison of the "New" lensing (dr_shear=100
) with the "Old" one.
The orange dots represent the comparison of two "New" simulations. One with dr_shear=100
and the other one with dr_shear=10
.
We expect to have some correlation in both cases but we clearly see no correlation in the first case.
Regarding the power spectrum we find similar results. In the following plots it is represented the ratio between the power spectrum for two simulations.
The first one is the ratio of two "new" simulations (for dr_shear=100
and for dr_shear=10
):
The second one is the ratio between one "new" simulation and the "old" one:
All the results of "new" simulations appear to be consistent between them, but when they're compared with the "old" classic simulations, discrepancies appear.
A more detailed explanation of the process followed to get these results can be found in the Notebook: /global/projecta/projectdirs/desi/users/cramirez/Compare_new_lensing.ipynb
At first I didn't understand the scatter plot, I thought that was e1 on the x-axis and e2 on the y-axis but it's e1 in one sim and e1 on the other sim on each of the axes. As you say (and if the seed is the same) the plots should show some correlation (as they do when you compare with different dr values).
I think that this illustrates the problem better: "Original" e1 map dr = 1, e1 map dr = 100, e1 map dr = 1000, e1 map
You can see the healpixel's boundaries. I guess you have to crank up dr_shear
even more (or rather less)?
Hmm, the structure is totally different:
But between dr=1 and dr=10 things seem to be consistent-ish as you showed:
OK, this is great work! So this is not working at all. There should be a limit where it works, however you're showing that even after increasing the precision the results stay the same. This tells me there's a bug in the code.
Let me look at the code a bit and I'll get back to you. I have a 14h flight tomorrow, so this will be my "fun" project.
Oh, question @cramirezpe : did you use the parameters in examples/shear_test
as they are or did you have to change anything (beyond dr_shear
).
@damonge, the only changes I made in the param.cfg
given in examples/shear_test
are the paths of the files (because I changed directories) and obviously dr_shear
.
@cramirezpe @fjaviersanchez @andreufont OK, quick update: on the flight today I've decided to overhaul the way the code was doing this, and I'm almost done writing a new version that I think should work almost out of the box. I need 1-2 days to finish the last details, and I'll get back to you with new instructions.
OK @cramirezpe @fjaviersanchez @andreufont
Jetlag has done its magic, and I've managed to get this done sooner than expected.
I've implemented the new method in a branch called faster_shear_b
. Please test it out there and I'll merge into faster_shear
if all looks good.
Things to note:
Wow, that was fast! Cesar is already working on it, will come back to you soon.
With the new code the plots have improved in a significant way.
Now the scatter plot without both of new lensing with dr_shear=100
vs the old lensing and new lensing with dr_shear=10
vs old lensing looks like this:
For the case of power spectrums, they look like this:
(For ratio between dr_shear=100
and old lensing):
(For ratio between dr_shear=10
and old lensing):
The difference in amplitude could be given by an error in normalization, as @andreufont pointed out.
@andreufont @fjaviersanchez also suggested to multiply e1 and e2 by 2 for each galaxy to solve the normalization problem.
The resulting structure map is:
the power spectrum ratio becomes:
and the scatter plot:
Apparently, the introduction of this 2 solves the problem.
Wooo! Amazing!! This is really great. OK, so let's get a bit more quantitative: could we run say 100 simulations of this type (with both methods) and compute the ratio of the mean shear spectra with both methods? We'd like to see your second plot without a log-y axis.
Before that though, let me fix this bug and merge everything into the original faster_shear
branch.
Hi @damonge - these are still tiny boxes, right? Is it better to do 100 small ones than a large one?
In any case, Cesar, you'll need to learn how to submit multiple SLURM scripts. I can help you find the instructions if you don't find it.
Yes, that's right. I first wanted to get a sense of how bad dr=100 is before asking @cramirezpe to run nightmarish boxes.
Sounds good. He is working on it now, we'll let you know once he has some results!
OK, let me know if I misjudged how much work this would be.
Hi @damonge. Don't worry, the script is already made and I'm working with the Notebook to obtain the results. Also simulations are very fast.
You said you wanted to merge everything into the original faster_shear
branch. Do you want me to wait until then or do I run the simulations with the faster_shear_b
branch?
Hi @cramirezpe - you should write the script in a way that is very easy to update and re-run with new versions of the code. Then I would start with a suite of 10 or 100 simulations with the faster_shear_b branch, and take a look at the results.
Once you have the script, it should be easy to update once we have a newer version.
Ah, sorry, I never got around to merging things. I think the easiest thing will be to forget about faster_shear
and just work with faster_shear_b
. I'll delete faster_shear
eventually.
Hi @damonge, here it is the result of the ratio: The crossed term E-B performed worse:
To make the simulations I changed each seed in param.cfg by a random number. As before, for the case of New simulations I multiplied e1 and e2 for each galaxy by 2.
Humm, do you understand why the ratio for EE is now <0.5 whereas in your first test above it seemed to be close to 1?
I wouldn't worry about EB at all (at that point you're basically dividing zero by zero)
Actually, even better question: the delta_gs should be exactly the same in the two types of simulations, so we should understand why their ratio is not exactly one.
Did you use the same seed to generate both types of simulations?
As you said I forgot to put the same seed for both types. Now I'll run it that way and see if the results improve.
Hi, with respect to the first question, the plots I presented this morning were produced wiht dr_shear=100
. Now I've made plot with dr_shear=10
and the result is quite the same.
In this new plots I've made the ratio for simulations with the same seed, the results are te following (I've used dr_shear=10
):
Which is close to one. The log plot to compare:
Since now the problem with delta_gs is solved, I'll concentrate in the other three fields. The following plots show multiple runs at the same time:
Hope it helps!
@cramirezpe, this is with the default nside=256
right? @damonge would it make sense to make a couple of realizations with a smaller and a larger nside
? I think that basically the "increased" scatter in this plot:
is telling us that the total shear signal is "weaker" and we have to re-weight by the ratio of the rms of the ellipticities, i.e., calculate the <e^2> for the original, calculate <e^2> for the dr_shear simulation and reweight the power-spectra by that ratio.
@fjaviersanchez, the nside
used is the default 256
.
OK, great. I agree it'd make sense to see what happens if we increased the resolution. With the current implementation there's a limit where we should be able to recover the original shear. If reducing dr_shear doesn't do it, then it may be the angular resolution.
So, I'd suggest, if you have enough memory, run a few sims with dr=10 and nside=512 or 1024. I don't think you'd need to run more than 10 sims to see a bias.
Hi @damonge @fjaviersanchez,
I've done the runs with different nside and after analyse them here are the main results I've found.
First of all I'd like to say a couple of things in case there is something wrong with my configuration:
In every param.cfg
file there are two nside
values. One inside the kappa
section and the other in the shear
one. I assumed that the only one that should be changed is the shear
one (the other remaining with the value 128
).
Even when running the old lensing method, simulations ask for the shear
section in the param.cfg
which is something that I don't understand. Is it a normal behaviour?
Now, moving to the simulations. I've run simulations for nshear=[128,256,512,1024]
and dr_shear=10
. 50 of them for each new and old lensing.
All of them are quite similar to each other. The time of execution changes with nshear
:
nshear=128
simulations last about 5 seconds.nshear=256
simulations last about 15 seconds.nshear=512
simulations last about 100 seconds.nshear=1024
simulations last about 121 seconds.
these values are just an approximation since them oscillate. Now the plots. First I show the mean power spectrum ratio (between new and old lensing) for both nside=256
and nside=1024
.
If we look at each type separately we have:
Finally, we can compare the ratio between each nside
simulation (only the new lensing one!) and the nside=1024
one. That yields the following plots:
If I make this last exercise with the old simulations I get a constant 1 (as expected).
Thanks a lot @cramirezpe! These are good tests. We have to figure out what's going on. Just a small request: Could you please print the standard deviation of the ellipticity module for each nside and the original? I.e.,
emod = np.sqrt(e1**2+e2**2)
and then sigma_I = np.std(emod)
. You repeat this for each realization in a given nside
and get the mean of sigma_I for the realizations np.mean(sigma_I)
and np.std(sigma_I)
. And the same for the old method (if you just have one realization of each you can just report the number that you get from that realization and that's it).
If you point me to the files that you created at NERSC I can also try to do it 🙂
Hi @fjaviersanchez, I had a problem with NERSC and now we are waiting for more CPU hours to continue working. Sorry for the inconvenience.
No worries! I hope you can solve the issue.
Hi @damonge @fjaviersanchez
Running simulations for different parameters in NERSC we found that changing some Slurm options yield different sources. We were concerned about the implications of this in our current work so we investigated it deeply.
Apparently if you run CoLoRe on a single core you obtain one structure and if you run it on multicore you obtain another structure. The interesting thing is that this multicore structure does not depend on the parallelization method (MPI or OMP) used.
It is interesting because you stated before that we should use export OMP_NUM_THREADS=1
. We are not sure if you did it as precaution or because there is a real problem with OMP that may affect us.
We also found the debug terminal output of CoLoRe a little bit confusing when printing Nodes and Threads. When used with NERSC and with MPI parallelization activated:
HI @cramirezpe
sorry if this wasn't clear, yes. Nodes means individual MPI tasks, and threads are OMP threads. Colore should give always the same output if you run with the same number of MPI tasks/threads/whatever and no OpenMP. If you run with openmp and more than one thread, then different runs will yield different simulations.
Happy for you to clarify this in a PR (and I'll merge it quickly).
What I was trying to say is that even if you use OMP or MPI, the sources are the same. This is something I was not expecting due to your requirement about export OMP_NUM_THREADS = 1
.
I can even do mixed simulations with say 4 MPI threads and 2 OMP threads and they will hold the same structure.
What does not work is to do part of the simulations with one combination of threads (lets say 4 MPI and 2 OMP) and other with another different combination.
Oooh, I see. Hmmm, I didn't expect that. I seem to remember the poisson sampling was done in OMP loops, in which case the catalogs would be different, but maybe I'm wrong...
Just to clarify: using more than one OMP affects the random numbers and you end up with different sources. However, you get the same sources in old / new simulations if you use the same number of threads.
Moreover, it looks like exactly the same thing happens if you use MPI tasks instead.
In any case, it was a useful exercise, and we have learned about the paralellisation. We'll proceed with the tests next week!
Yep, got it. Still surprised about that, since OMP should allocate threads kind of randomly to different parts of the for loop. It may be that NERSC's OMP is more orderly than I thought.
Hi @damonge,
would you mind adding me as a contributor so I can start a branch and do the output thing?
Done!
Hi @fjaviersanchez
We've made again the simulations and here are the results you were asking for. The standard deviation of the ellipticity for all the simulations using the "old" method is (using 10 simulations):
np.mean(sigma_I)
: 0.0002735282780548876np.std(sigma_I)
: 6.004891646911408e-6For the "new" method the result depends on the nside
used (using 10 simulations for each nside):
np.mean(sigma_I) | np.std(sigma_I) | |
---|---|---|
nside = 128 | 0.0002658999875826471 | 5.816226898630879e-6 |
nside = 256 | 0.00026537082666360136 | 5.861836418750325e-6 |
nside = 512 | 0.00026523744850234814 | 5.86187439915547e-6 |
nside = 1024 | 0002652141424930916 | 5.86947374489586e-6 |
Regarding the output changes, I created a pull request to merge it with the master branch.
We are waiting for more instructions on how to continue with the simulations.
@damonge and all, so if you divide mean(sigma_I)_new and mean(sigma_I)_old you get ~0.97 which is approximately the value of the ratio at large ell in the plot here:
I'm not sure why it's not constant (I guess those are windowing effects at large scale) but I think that we can correct the decrease of power just by this "calibration" factor.
@cramirezpe, can you plot the same ratio that you had i.e., Cls_new/Cls_old
and correct it multiplying by mean(sigma_I)_old/mean(sigma_I)_new
to see if now it's closer to 1 please?
@cramirezpe @andreufont @fjaviersanchez OK, before we correct blindly, it'd be good to understand why we have that factor of 0.97. In principle, in the limit dr_shear -> 0 and nside -> infinity, we should recover the original result. What dr_shear were these generated with?
Would it be worth having a quick skype at some point to discuss this?
Hi,
@fjaviersanchez as soon as NERSC is available I'll make the plots.
@damonge the nside used in the last results is dr_shear=10
.
I'll be glad to do a quick skype if you all are interested.
I could also do a Skype later this week. Thursday or Friday afternoon would work.
From: cramirezpe notifications@github.com Sent: Monday, February 24, 2020 4:03:21 PM To: damonge/CoLoRe CoLoRe@noreply.github.com Cc: Font Ribera, Andreu a.font@ucl.ac.uk; Mention mention@noreply.github.com Subject: Re: [damonge/CoLoRe] Faster shear (#53)
Hi,
@fjaviersanchezhttps://github.com/fjaviersanchez as soon as NERSC is available I'll make the plots. @damongehttps://github.com/damonge the nside used in the last results is dr_shear=10.
I'll be glad to do a quick skype if you all are interested.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/damonge/CoLoRe/issues/53?email_source=notifications&email_token=AA5RYR5LGSHTLGIUS7JCDBDREPVUTA5CNFSM4J5574W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMYNS4A#issuecomment-590403952, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5RYR37IO7FAVUHKEGY7JDREPVUTANCNFSM4J5574WQ.
Weird to set these things up through github issues XD. How about Friday afternoon? I'd be free after 3pm UK time.
You are right, let's move to email!
@fjaviersanchez
I attach here the plots of the ratio Cls_new/Cls_old
corrected by the factor mean(sigma_I)_old/mean(sigma_I)_new
.
I used 10 realisations, hence the appearance of more wiggles than previous plots.
Currently the shear of each source is computed by calculating the mass integral along their line of sight. This makes the code significantly slower if you include shear. We'd like to implement a faster method. The proposal in #52 is to calculate convergence maps with good angular resolution at a set of redshifts with a moderate separation (e.g. Dz=0.1 or something like that), and then interpolate between them.