damonge / CoLoRe

CoLoRe - Cosmological Lofty Realizations
GNU General Public License v3.0
17 stars 13 forks source link

Faster shear #53

Closed damonge closed 4 years ago

damonge commented 4 years ago

Currently the shear of each source is computed by calculating the mass integral along their line of sight. This makes the code significantly slower if you include shear. We'd like to implement a faster method. The proposal in #52 is to calculate convergence maps with good angular resolution at a set of redshifts with a moderate separation (e.g. Dz=0.1 or something like that), and then interpolate between them.

damonge commented 4 years ago

Instructions to test the new faster_shear branch:

Installation and compilation

Checkout the faster_shear branch. Open the Makefile. In line 16 you'll see the line:

#DEFINEFLAGS += -D_USE_NEW_LENSING

Uncommenting that line will switch on the new faster-but-approximate method of generating galaxy elipticities. I would suggest compiling two versions of the code, with and without this flag to compare the methods (you'll need to rename them so they don't get overwritten when running make).

Free parameters

Currently there are two free parameters (other than -D_USE_NEW_LENSING) controlling the new shear method. First, dr_shear controls the spacing, in comoving Mpc/h, between the different lensing planes used to interpolate the galaxy shears. I suggest testing something like 100 Mpc/h to begin with, and see how much larger we can make it (if at all). Secondly, an nside parameter controls the HEALPix resolution of the shear maps. Both parameters should be included within a shear section of the param file. This section must be included whenever any source has shear.

Testing the method

Quick tests

To begin with, I've created a folder in examples/shear_test (pretty much a copy of cl_test) containing the parameters of a test simulation that should be possible to run on a laptop.

I would start by running two simulations, with the old and new methods, and then comparing the outputs of both directly in terms of e1 and e2 (e.g. make a histogram of e1_new - e1_old or something like that). If those don't look too crazy (or maybe even if they do), I'd then move on to computing the shear power spectrum of the sample. You can see how to compute these in the cl_test folder (see for instance cl_test.py).

Note that, in order to generate two simulations with exactly the same sources in them, you need to switch off openmp and use only MPI parallelization (otherwise openmp introduces extra randomness), and you need to run both simulations using the same number of nodes. Note that you don't need to recompile the code without openmp, just run export OMP_NUM_THREADS=1 before you start working.

Final goal

Ultimately we want to be able to run things at LSST scale. The type of simulation we can expect there is illustrated in the LSST example folder. Running this requires ~1TB of memory, so you better run this at NERSC. We want to check that the new method runs significantly faster than the old one. To get an idea of how long the old one actually takes, it'd be good to run a simulation of this with the old method and record the time (the output of CoLoRe already tells you this). You can do this even before you've fully tested the method, just so we have a benchmark to fight against to begin with ;-).

cramirezpe commented 4 years ago

We've been working on this topic and found discrepancies between the results using NEW_LENSING and the "old" ones.

For example, we have the following scatter plot: The Blue dots represent the comparison of the "New" lensing (dr_shear=100) with the "Old" one. The orange dots represent the comparison of two "New" simulations. One with dr_shear=100 and the other one with dr_shear=10.

scatter

We expect to have some correlation in both cases but we clearly see no correlation in the first case.

Regarding the power spectrum we find similar results. In the following plots it is represented the ratio between the power spectrum for two simulations.

The first one is the ratio of two "new" simulations (for dr_shear=100 and for dr_shear=10):

powerspectrum2

The second one is the ratio between one "new" simulation and the "old" one: powerspectrum2

All the results of "new" simulations appear to be consistent between them, but when they're compared with the "old" classic simulations, discrepancies appear.

A more detailed explanation of the process followed to get these results can be found in the Notebook: /global/projecta/projectdirs/desi/users/cramirez/Compare_new_lensing.ipynb

fjaviersanchez commented 4 years ago

At first I didn't understand the scatter plot, I thought that was e1 on the x-axis and e2 on the y-axis but it's e1 in one sim and e1 on the other sim on each of the axes. As you say (and if the seed is the same) the plots should show some correlation (as they do when you compare with different dr values).

I think that this illustrates the problem better: "Original" e1 map image dr = 1, e1 map image dr = 100, e1 map image dr = 1000, e1 map image

You can see the healpixel's boundaries. I guess you have to crank up dr_shear even more (or rather less)?

fjaviersanchez commented 4 years ago

Hmm, the structure is totally different:

image

But between dr=1 and dr=10 things seem to be consistent-ish as you showed:

image

damonge commented 4 years ago

OK, this is great work! So this is not working at all. There should be a limit where it works, however you're showing that even after increasing the precision the results stay the same. This tells me there's a bug in the code.

Let me look at the code a bit and I'll get back to you. I have a 14h flight tomorrow, so this will be my "fun" project.

damonge commented 4 years ago

Oh, question @cramirezpe : did you use the parameters in examples/shear_test as they are or did you have to change anything (beyond dr_shear).

cramirezpe commented 4 years ago

@damonge, the only changes I made in the param.cfg given in examples/shear_test are the paths of the files (because I changed directories) and obviously dr_shear.

damonge commented 4 years ago

@cramirezpe @fjaviersanchez @andreufont OK, quick update: on the flight today I've decided to overhaul the way the code was doing this, and I'm almost done writing a new version that I think should work almost out of the box. I need 1-2 days to finish the last details, and I'll get back to you with new instructions.

damonge commented 4 years ago

OK @cramirezpe @fjaviersanchez @andreufont Jetlag has done its magic, and I've managed to get this done sooner than expected. I've implemented the new method in a branch called faster_shear_b. Please test it out there and I'll merge into faster_shear if all looks good.

Things to note:

andreufont commented 4 years ago

Wow, that was fast! Cesar is already working on it, will come back to you soon.

cramirezpe commented 4 years ago

With the new code the plots have improved in a significant way.

Now the scatter plot without both of new lensing with dr_shear=100 vs the old lensing and new lensing with dr_shear=10 vs old lensing looks like this: dispersion

For the case of power spectrums, they look like this: (For ratio between dr_shear=100 and old lensing): power100vs

(For ratio between dr_shear=10 and old lensing): powerspectrumrel10

The difference in amplitude could be given by an error in normalization, as @andreufont pointed out.

cramirezpe commented 4 years ago

@andreufont @fjaviersanchez also suggested to multiply e1 and e2 by 2 for each galaxy to solve the normalization problem.

The resulting structure map is: download (1)

the power spectrum ratio becomes: download (2)

and the scatter plot: download (3)

Apparently, the introduction of this 2 solves the problem.

damonge commented 4 years ago

Wooo! Amazing!! This is really great. OK, so let's get a bit more quantitative: could we run say 100 simulations of this type (with both methods) and compute the ratio of the mean shear spectra with both methods? We'd like to see your second plot without a log-y axis.

Before that though, let me fix this bug and merge everything into the original faster_shear branch.

andreufont commented 4 years ago

Hi @damonge - these are still tiny boxes, right? Is it better to do 100 small ones than a large one?

In any case, Cesar, you'll need to learn how to submit multiple SLURM scripts. I can help you find the instructions if you don't find it.

damonge commented 4 years ago

Yes, that's right. I first wanted to get a sense of how bad dr=100 is before asking @cramirezpe to run nightmarish boxes.

andreufont commented 4 years ago

Sounds good. He is working on it now, we'll let you know once he has some results!

damonge commented 4 years ago

OK, let me know if I misjudged how much work this would be.

cramirezpe commented 4 years ago

Hi @damonge. Don't worry, the script is already made and I'm working with the Notebook to obtain the results. Also simulations are very fast.

You said you wanted to merge everything into the original faster_shear branch. Do you want me to wait until then or do I run the simulations with the faster_shear_b branch?

andreufont commented 4 years ago

Hi @cramirezpe - you should write the script in a way that is very easy to update and re-run with new versions of the code. Then I would start with a suite of 10 or 100 simulations with the faster_shear_b branch, and take a look at the results.

Once you have the script, it should be easy to update once we have a newer version.

damonge commented 4 years ago

Ah, sorry, I never got around to merging things. I think the easiest thing will be to forget about faster_shear and just work with faster_shear_b. I'll delete faster_shear eventually.

cramirezpe commented 4 years ago

Hi @damonge, here it is the result of the ratio: image The crossed term E-B performed worse: image

To make the simulations I changed each seed in param.cfg by a random number. As before, for the case of New simulations I multiplied e1 and e2 for each galaxy by 2.

damonge commented 4 years ago

Humm, do you understand why the ratio for EE is now <0.5 whereas in your first test above it seemed to be close to 1?

I wouldn't worry about EB at all (at that point you're basically dividing zero by zero)

damonge commented 4 years ago

Actually, even better question: the delta_gs should be exactly the same in the two types of simulations, so we should understand why their ratio is not exactly one.

Did you use the same seed to generate both types of simulations?

cramirezpe commented 4 years ago

As you said I forgot to put the same seed for both types. Now I'll run it that way and see if the results improve.

cramirezpe commented 4 years ago

Hi, with respect to the first question, the plots I presented this morning were produced wiht dr_shear=100. Now I've made plot with dr_shear=10 and the result is quite the same.

In this new plots I've made the ratio for simulations with the same seed, the results are te following (I've used dr_shear=10): image Which is close to one. The log plot to compare: image

Since now the problem with delta_gs is solved, I'll concentrate in the other three fields. The following plots show multiple runs at the same time: image

image

image

Hope it helps!

fjaviersanchez commented 4 years ago

@cramirezpe, this is with the default nside=256 right? @damonge would it make sense to make a couple of realizations with a smaller and a larger nside? I think that basically the "increased" scatter in this plot: image is telling us that the total shear signal is "weaker" and we have to re-weight by the ratio of the rms of the ellipticities, i.e., calculate the <e^2> for the original, calculate <e^2> for the dr_shear simulation and reweight the power-spectra by that ratio.

cramirezpe commented 4 years ago

@fjaviersanchez, the nside used is the default 256.

damonge commented 4 years ago

OK, great. I agree it'd make sense to see what happens if we increased the resolution. With the current implementation there's a limit where we should be able to recover the original shear. If reducing dr_shear doesn't do it, then it may be the angular resolution.

So, I'd suggest, if you have enough memory, run a few sims with dr=10 and nside=512 or 1024. I don't think you'd need to run more than 10 sims to see a bias.

cramirezpe commented 4 years ago

Hi @damonge @fjaviersanchez,

I've done the runs with different nside and after analyse them here are the main results I've found.

First of all I'd like to say a couple of things in case there is something wrong with my configuration:


Now, moving to the simulations. I've run simulations for nshear=[128,256,512,1024] and dr_shear=10. 50 of them for each new and old lensing. All of them are quite similar to each other. The time of execution changes with nshear:

If we look at each type separately we have: image image image

Finally, we can compare the ratio between each nside simulation (only the new lensing one!) and the nside=1024 one. That yields the following plots: image image image

If I make this last exercise with the old simulations I get a constant 1 (as expected).

fjaviersanchez commented 4 years ago

Thanks a lot @cramirezpe! These are good tests. We have to figure out what's going on. Just a small request: Could you please print the standard deviation of the ellipticity module for each nside and the original? I.e., emod = np.sqrt(e1**2+e2**2) and then sigma_I = np.std(emod). You repeat this for each realization in a given nside and get the mean of sigma_I for the realizations np.mean(sigma_I) and np.std(sigma_I). And the same for the old method (if you just have one realization of each you can just report the number that you get from that realization and that's it).

fjaviersanchez commented 4 years ago

If you point me to the files that you created at NERSC I can also try to do it 🙂

cramirezpe commented 4 years ago

Hi @fjaviersanchez, I had a problem with NERSC and now we are waiting for more CPU hours to continue working. Sorry for the inconvenience.

fjaviersanchez commented 4 years ago

No worries! I hope you can solve the issue.

cramirezpe commented 4 years ago

Hi @damonge @fjaviersanchez

Running simulations for different parameters in NERSC we found that changing some Slurm options yield different sources. We were concerned about the implications of this in our current work so we investigated it deeply.

Apparently if you run CoLoRe on a single core you obtain one structure and if you run it on multicore you obtain another structure. The interesting thing is that this multicore structure does not depend on the parallelization method (MPI or OMP) used.

It is interesting because you stated before that we should use export OMP_NUM_THREADS=1. We are not sure if you did it as precaution or because there is a real problem with OMP that may affect us.

We also found the debug terminal output of CoLoRe a little bit confusing when printing Nodes and Threads. When used with NERSC and with MPI parallelization activated:

damonge commented 4 years ago

HI @cramirezpe

sorry if this wasn't clear, yes. Nodes means individual MPI tasks, and threads are OMP threads. Colore should give always the same output if you run with the same number of MPI tasks/threads/whatever and no OpenMP. If you run with openmp and more than one thread, then different runs will yield different simulations.

damonge commented 4 years ago

Happy for you to clarify this in a PR (and I'll merge it quickly).

cramirezpe commented 4 years ago

What I was trying to say is that even if you use OMP or MPI, the sources are the same. This is something I was not expecting due to your requirement about export OMP_NUM_THREADS = 1.

I can even do mixed simulations with say 4 MPI threads and 2 OMP threads and they will hold the same structure.

What does not work is to do part of the simulations with one combination of threads (lets say 4 MPI and 2 OMP) and other with another different combination.

damonge commented 4 years ago

Oooh, I see. Hmmm, I didn't expect that. I seem to remember the poisson sampling was done in OMP loops, in which case the catalogs would be different, but maybe I'm wrong...

andreufont commented 4 years ago

Just to clarify: using more than one OMP affects the random numbers and you end up with different sources. However, you get the same sources in old / new simulations if you use the same number of threads.

Moreover, it looks like exactly the same thing happens if you use MPI tasks instead.

In any case, it was a useful exercise, and we have learned about the paralellisation. We'll proceed with the tests next week!

damonge commented 4 years ago

Yep, got it. Still surprised about that, since OMP should allocate threads kind of randomly to different parts of the for loop. It may be that NERSC's OMP is more orderly than I thought.

cramirezpe commented 4 years ago

Hi @damonge,

would you mind adding me as a contributor so I can start a branch and do the output thing?

damonge commented 4 years ago

Done!

cramirezpe commented 4 years ago

Hi @fjaviersanchez

We've made again the simulations and here are the results you were asking for. The standard deviation of the ellipticity for all the simulations using the "old" method is (using 10 simulations):

For the "new" method the result depends on the nside used (using 10 simulations for each nside):

np.mean(sigma_I) np.std(sigma_I)
nside = 128 0.0002658999875826471 5.816226898630879e-6
nside = 256 0.00026537082666360136 5.861836418750325e-6
nside = 512 0.00026523744850234814 5.86187439915547e-6
nside = 1024 0002652141424930916 5.86947374489586e-6

Regarding the output changes, I created a pull request to merge it with the master branch.

We are waiting for more instructions on how to continue with the simulations.

fjaviersanchez commented 4 years ago

@damonge and all, so if you divide mean(sigma_I)_new and mean(sigma_I)_old you get ~0.97 which is approximately the value of the ratio at large ell in the plot here:

image

I'm not sure why it's not constant (I guess those are windowing effects at large scale) but I think that we can correct the decrease of power just by this "calibration" factor.

@cramirezpe, can you plot the same ratio that you had i.e., Cls_new/Cls_old and correct it multiplying by mean(sigma_I)_old/mean(sigma_I)_new to see if now it's closer to 1 please?

damonge commented 4 years ago

@cramirezpe @andreufont @fjaviersanchez OK, before we correct blindly, it'd be good to understand why we have that factor of 0.97. In principle, in the limit dr_shear -> 0 and nside -> infinity, we should recover the original result. What dr_shear were these generated with?

Would it be worth having a quick skype at some point to discuss this?

cramirezpe commented 4 years ago

Hi,

@fjaviersanchez as soon as NERSC is available I'll make the plots. @damonge the nside used in the last results is dr_shear=10.

I'll be glad to do a quick skype if you all are interested.

andreufont commented 4 years ago

I could also do a Skype later this week. Thursday or Friday afternoon would work.


From: cramirezpe notifications@github.com Sent: Monday, February 24, 2020 4:03:21 PM To: damonge/CoLoRe CoLoRe@noreply.github.com Cc: Font Ribera, Andreu a.font@ucl.ac.uk; Mention mention@noreply.github.com Subject: Re: [damonge/CoLoRe] Faster shear (#53)

Hi,

@fjaviersanchezhttps://github.com/fjaviersanchez as soon as NERSC is available I'll make the plots. @damongehttps://github.com/damonge the nside used in the last results is dr_shear=10.

I'll be glad to do a quick skype if you all are interested.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/damonge/CoLoRe/issues/53?email_source=notifications&email_token=AA5RYR5LGSHTLGIUS7JCDBDREPVUTA5CNFSM4J5574W2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMYNS4A#issuecomment-590403952, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AA5RYR37IO7FAVUHKEGY7JDREPVUTANCNFSM4J5574WQ.

damonge commented 4 years ago

Weird to set these things up through github issues XD. How about Friday afternoon? I'd be free after 3pm UK time.

andreufont commented 4 years ago

You are right, let's move to email!

cramirezpe commented 4 years ago

@fjaviersanchez I attach here the plots of the ratio Cls_new/Cls_old corrected by the factor mean(sigma_I)_old/mean(sigma_I)_new.

I used 10 realisations, hence the appearance of more wiggles than previous plots. image

image