Bootstrapping - Githubissues

betajzhang commented 10 years ago

Hi Guys,

Is there a switch for doing bootstrapping in "presubmission.py"? I am dealing with variable shear catalogs. Thanks a lot in advance!

-Jun Zhang

barnabytprowe commented 10 years ago

Hi Jun!

Do you mean bootstrapping to estimate the uncertainties on aperture mass dispersion estimates? If so then I can confirm that there is no switch in presubmission.py to do that. We only evaluate metric scores using the calculated best estimate, and do not refer to the errors.

I know that some correlation function codes do provide a bootstrapping option. presubmission.py is built around @rmjarvis 's corr2 code, and a quick skim of the corr2 readme suggests that this is not supported via the input interface to that code either. (Please correct me if I'm wrong Mike.) Of course, you can simply call corr2 multiple times, giving it (with replacement) resampled input catalogues each time, to do the bootstrap estimate of uncertainties. But I think this is not something that we plan to release ourselves.

rmandelb commented 10 years ago

Bootstrapping is not supported in corr2, but it does output directly an estimate of the error on calculated quantities like the aperture mass.

betajzhang commented 10 years ago

Hi Barnaby,

Thank you so much for your quick response! Indeed, I am asking for the bootstrapping option not because of error estimates. I simply want to achieve a high score in the "variable shear" categories, i.e., a more precise prediction of the aperture mass in every angular bin.

Yesterday, I submitted my results to "control-ground-variable" (for which I got 67.8) and "real_galaxy_ground_variable" (I got 157.6). As you can see, the scores are not impressive at all. Even before the submission, I suspected that my score would not be high. This is because after running "presubmission.py" with my shear catalogs, the "M_E" and "M_B" results in the output file differ a lot between neighbouring logarithmic bins, and the aperture mass error is roughly of the same order of magnitude as the signal. Indeed, I was surprised by my "high" scores after submission. I later learned that bootstrapping could smooth the output curve of the correlation function/aperture mass as a function of distance. It seems that bootstrapping would at least make the outputs in neighbouring bins more similar. The underlying reason is complicated to explain here. But I am pretty much convinced. This is why I asked for this option.

I will work with my friend to see if it is possible to introduce bootstrapping into "presubmission.py". It should not be too difficult, right? By the way, I guess Jackknife will do a similar job. What do you think?

In any case, thanks a lot for your time!

Best, Jun

----- 原始邮件 ----- 发件人: "Barnaby Rowe" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期二, 2014年 3 月 04日下午 8:32:20 主题: Re: [great3-public] Bootstrapping (#9)

Hi Jun!

Do you mean bootstrapping to estimate the uncertainties on aperture mass dispersion estimates? If so then I can confirm that there is no switch in presubmission.py to do that. We only evaluate metric scores using the calculated best estimate, and do not refer to the errors.

I know that some correlation function codes do provide a bootstrapping option. presubmission.py is built around @rmjarvis 's corr2 code, and a quick skim of the corr2 readme suggests that this is not supported via the input interface to that code either. (Please correct me if I'm wrong Mike.) Of course, you can simply call corr2 multiple times, giving it (with replacement) resampled input catalogues each time, to do the bootstrap estimate of uncertainties. But I think this is not something that we plan to release ourselves.

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36619381

rmandelb commented 10 years ago

I’ve never known the bootstrap or jackknife to have that effect that you mention, and I think that for GREAT3 it will definitely fail because the galaxy distribution was specifically designed to cancel out shape noise if you use all the galaxies. If you bootstrap, then each of the bootstrap-resampled datasets will have lots of shape noise, so they should be far noisier than if you use the whole sample. I think this is effectively a statement that because of how we chose the galaxy positions and orientations, they are not IID (independent and identically distributed) so the assumption beyond the bootstrap/jackknife is wrong.

betajzhang commented 10 years ago

Rachel, I have trouble understanding the following sentence: "for GREAT3 it will definitely fail because the galaxy distribution was specifically designed to cancel out shape noise if you use all the galaxies". I understand that the shape noise does cancel out for constant shear branches, but for variable shears, the cancellation is quite difficult, unless every pair of orthogonal galaxies are located in the same physical positions.

Regarding "bootstrapping/jackknife", maybe my definition/understanding is somewhat different from yours. -Jun

----- 原始邮件 ----- 发件人: "rmandelb" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期二, 2014年 3 月 04日下午 9:23:42 主题: Re: [great3-public] Bootstrapping (#9)

I’ve never known the bootstrap or jackknife to have that effect that you mention, and I think that for GREAT3 it will definitely fail because the galaxy distribution was specifically designed to cancel out shape noise if you use all the galaxies. If you bootstrap, then each of the bootstrap-resampled datasets will have lots of shape noise, so they should be far noisier than if you use the whole sample. I think this is effectively a statement that because of how we chose the galaxy positions and orientations, they are not IID (independent and identically distributed) so the assumption beyond the bootstrap/jackknife is wrong.

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36622969

rmandelb commented 10 years ago

Sorry for the naive question but it’s not clear to me from this discussion: Have you read the appendix in the handbook that describes shape noise cancellation? We specifically set up the galaxy distribution so that the shape noise would be pure B mode and the lensing shear pure E mode. This works only if you use all the galaxies in your measurement.

rmjarvis commented 10 years ago

Bootstrapping is not supported in corr2, but it does output directly an estimate of the error on calculated quantities like the aperture mass.

That's right. corr2 does not have any bootstrapping option out of the box. Of course, you can implement it yourself by constructing multiple input catalogs as needed and combining the outputs.

Also, it's worth pointing out that the error estimates reported by corr2 are merely a simple propagation of the shape noise. They do not include sample variance. So in practice, when using it on real data, people often use the jackknife method to estimate the real errors, including sample variance. You could use bootstrap instead if you prefer.

betajzhang commented 10 years ago

I missed that part of the appendix. It is good to know. Thanks.

-Jun

----- 原始邮件 ----- 发件人: "rmandelb" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期二, 2014年 3 月 04日下午 9:42:48 主题: Re: [great3-public] Bootstrapping (#9)

Sorry for the naive question but it’s not clear to me from this discussion: Have you read the appendix in the handbook that describes shape noise cancellation? We specifically set up the galaxy distribution so that the shape noise would be pure B mode and the lensing shear pure E mode. This works only if you use all the galaxies in your measurement.

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36624419

rmandelb commented 10 years ago

No problem. And sorry for my rushed replies earlier - I was preparing to leave to teach my class, so I wrote in a hurry...

betajzhang commented 10 years ago

Hi Rachel,

I have read the appendix of the handbook. I now fully appreciate what you said. I understand that the intrinsic "galaxy ellipticities" are distributed in a very special way to generate a pure B-mode field. As a result, we shall expect a measurement of the lensing E-mode without intrinsic shape noise. This makes sense! However (and unfortunately), my shear estimator does not use the galaxy ellipticity (though it still provides unbiased measurements of shear/shear-correlation), and consequently, the "B-mode of my shear estimator" could more easily leak to the E-mode, adding a lot shape noise to the final results of the M_E. I think this is why my scores are not high for variable shears. There is little I can do at this point, as my shear estimator and the galaxy ellipticity do not have a one-to-one correspondence (their difference is an unknown function of the galaxy size and morphology). I will talk more about the details in the final meeting.

By the way, just curious, when you arrange the spatial distribution of the intrinsic galaxy ellipticities (to generate a pure B-mode field), what is the definition of ellipticity? Particularly for real galaxies of irregular shapes. I suppose it is defined by the second brightness moments, as mentioned early in the GREAT3 handbook, right?

Thanks a lot for your time!

-Jun

----- 原始邮件 ----- 发件人: "rmandelb" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期二, 2014年 3 月 04日下午 11:29:07 主题: Re: [great3-public] Bootstrapping (#9)

No problem. And sorry for my rushed replies earlier - I was preparing to leave to teach my class, so I wrote in a hurry...

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36635982

rmandelb commented 10 years ago

Interesting. This sounds like a case where if you want to test your variable shear recovery, you will need simulations that don’t have any special tricks to make the shape noise purely B mode, and that have a really large simulation volume instead so you can use the volume to beat down the noise.

To answer your question about the B-mode noise field: we use the parametric fits to the galaxy light profile, and then calculate the second moments of that model as you guessed. This is not perfect, but tests seemed to indicate that it works quite well.

betajzhang commented 10 years ago

Hi Rachel,

Yes, indeed. I just want to bring to your attention that recovering shear and recovering ellipticity could be two different things.

-Jun

----- 原始邮件 ----- 发件人: "rmandelb" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期三, 2014年 3 月 05日下午 9:59:00 主题: Re: [great3-public] Bootstrapping (#9)

Interesting. This sounds like a case where if you want to test your variable shear recovery, you will need simulations that don’t have any special tricks to make the shape noise purely B mode, and that have a really large simulation volume instead so you can use the volume to beat down the noise.

To answer your question about the B-mode noise field: we use the parametric fits to the galaxy light profile, and then calculate the second moments of that model as you guessed. This is not perfect, but tests seemed to indicate that it works quite well.

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36744100

barnabytprowe commented 10 years ago

I just want to bring to your attention that recovering shear and recovering ellipticity could be two different things.

Haha, well I don't think that was ever doubted, or questioned, by anyone. But to avoid simulation datasets of size > 10TB we had to play these B-mode games for this challenge. It sounds to me like the constant shear branches are a much fairer test of your method Jun. The constant shear results are also typically less noisy too, so it is best to be restricted there if anywhere.

And as only your top 5 branch leaderboard scores count towards the grand total, with 10 constant shear branches to apply to you still have plenty of opportunity to gather points!

betajzhang commented 10 years ago

Hi Barnaby,

Actually, the constant shear branches with bright (high SNR) galaxies of orthogonal shapes are neither very fair tests of shear recovery for two reasons:

It is never the case in real surveys;
Shear recovery methods typically have less errors on bright galaxies.

I fully understand the fact that the current arrangement is for limiting the data volume in the challenge. I have already tuned myself into a challenge-mode. The above suggestion is just some of my thoughts that might be useful for the future. No offence here.

-Jun

----- 原始邮件 ----- 发件人: "Barnaby Rowe" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期三, 2014年 3 月 05日下午 11:24:48 主题: Re: [great3-public] Bootstrapping (#9)

I just want to bring to your attention that recovering shear and recovering ellipticity could be two different things.

Haha, well I don't think that was ever doubted, or questioned, by anyone. But to avoid simulation datasets of size > 10TB we had to play these B-mode games for this challenge. It sounds to me like the constant shear branches are a much fairer test of your method Jun. The constant shear results are also typically less noisy too, so it is best to be restricted there if anywhere.

And as only your top 5 branch leaderboard scores count towards the grand total, with 10 constant shear branches to apply to you still have plenty of opportunity to gather points!

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36753094

barnabytprowe commented 10 years ago

Hi Jun,

It is never the case in real surveys;

You are referring to the constant shear field here? That is right, but I think this is unrealistic rather than unfair, and there are important differences between those descriptions. What aspect of the constant shear simulations do you think makes them an unfair test? (along with STEP1,2 and GREAT08)

On the science side, there are many lensing science applications, such as cluster lensing or galaxy-galaxy lensing, in which you do expect an approximately constant magnitude of shear in annuli from a central foreground lens. We believe we can handle rotations, so measurement methods which perform well in the GREAT3 constant shear branches should perform well in these applications too. There are also many methods for cosmic shear measurement for which a constant shear field in simulations is no barrier to success. As far as I know, the main problem for cosmic shear methods in constant shear sims is those that need to assume a zero-mean shear prior.

On the technical side, there are strong reasons for having a constant shear field if you are testing shear recovery in a challenge such as this. The ease of cancelling shape noise uncertainty is one, but also allowing submissions with small tables of mean results per field, rather than large catalogues, is a big bonus if you have to run a server to process uploads and score calculation automatically. However, in my opinion the main advantage for this type of simulation though is the sensitivity to bias - it turns out you can get much better constraints on m and c-type biases when a simple shear field is applied, because your estimates of m and c can be made to be a linear combination of the individual, noisy shear estimates. My work on constructing metrics for GREAT3 has led me to conclude that for a more complex, cosmological-type shear field with zero ensemble mean, information on biases can only be derived from 2nd order moments of the shear field (e.g. 2 point statistics) and this impacts the final signal to noise of the experiment significantly.

So I concede that the constant shear field is unrealistic, but it was concluded that these aspects of unrealism were well-motivated in terms of optimized experimental design. Barring the small subset of methods which need to assume a prior (and would normally reasonably choose this to be zero mean) I do not think constant shear simulations are unfair. But perhaps you could clarify what you meant if I've misunderstood?

Shear recovery methods typically have less errors on bright galaxies.

Now I'm definitely not sure I know what you mean. The distribution of galaxy fluxes comes directly from the COSMOS galaxy images that we use as a training set, and so like real data is dominated by fainter objects. You can see a plot of the SNR distribution reported by SExtractor as FLUX_AUTO/FLUXERR_AUTO in a GREAT3 simulation image on Figure 1 (page 9) of this pdf: https://dl.dropboxusercontent.com/u/495780/GREAT3/Metrics/Why_Update_Metrics.pdf

That plot comes from running SExtractor on a variable shear branch image, but the SNR distributions are the same (we tested this and I can show more plots from constant branches if you'd like to see them.) Based on my experience in HST, CFHTLenS and DES data I think that's a pretty realistic set of galaxy SNRs from which to try and estimate the shear - there are relatively very few bright galaxies.

There are some slight caveats of course (the way we selected these galaxies is in the Handbook if you need more detail): we put in place a size cut to remove the very smallest galaxies relative to the PSF, but this again was for reasons of experimental design. Some methods cannot produce a shear estimate in these conditions. We figured that if we had too many objects for which some methods failed totally (i.e. could not return a shear at all) then the shape noise cancellation schemes we'd put in place would start to break down and this would ruin the sensitivity in out tests. Quite a lot of thought was put into doing this in a way which caused minimal impact. Another caveat is the way that the size distribution was modified in order to match a deeper sample more typical of lensing sources: again, all details in the handbook. And, of course, in real life the few bright galaxies will have a different, typically weaker shear due to being closer in redshift: this, and redshift in general, is a subtlety that we left out!

But the caveats and unrealistic aspects of the galaxy size and flux distribution I described above do not seem to be what you meant.

Have I missed something? Were you perhaps referring to something more specific as 'the constant shear branches with bright (high SNR) galaxies of orthogonal shapes'?

The above suggestion is just some of my thoughts that might be useful for the future. No offence here.

None taken. Whoever takes on the mantle of a future challenge will find much in GREAT3 they can learn from, or learn to avoid!

But we have been very careful to ensure that the challenge is, although often unrealistic, unfair in the fewest number of ways possible. Wherever we were forced into a simplification that is potentially unfair, we have tried very hard to ensure this was only when it was strongly motivated by the goals of the challenge.

These goals were to start to explore the impact of realistic galaxy morphology, multiepoch imaging, and PSF uncertainty on shear estimates for cosmology and astrophysics. We put a lot into trying to achieve just those goals. Future challenges will have (and ought to have) a different, new set of goals, which will doubtless require different decisions to be made when the time comes.

betajzhang commented 10 years ago

Hi Barnaby,

Thanks a lot for your response! I think you've misunderstood my main point in the last email. I will explain to you during the final meeting, because there is something I don't want to discuss openly before the closure of the challenge. Let me take back the word "unfair" (indeed I borrowed it from your last email :)). I am now working on the variable PSF branches, and hopefully will get some decent scores.

Best, Jun

----- 原始邮件 ----- 发件人: "Barnaby Rowe" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期四, 2014年 3 月 06日下午 5:59:38 主题: Re: [great3-public] Bootstrapping (#9)

Hi Jun,

It is never the case in real surveys;

You are referring to the constant shear field here? That is right, but I think this is unrealistic rather than unfair, and there are important differences between those descriptions. What aspect of the constant shear simulations do you think makes them an unfair test? (along with STEP1,2 and GREAT08)

On the science side, there are many lensing science applications, such as cluster lensing or galaxy-galaxy lensing, in which you do expect an approximately constant magnitude of shear in annuli from a central foreground lens. We believe we can handle rotations, so measurement methods which perform well in the GREAT3 constant shear branches should perform well in these applications too. There are also many methods for cosmic shear measurement for which a constant shear field in simulations is no barrier to success. As far as I know, the main problem for cosmic shear methods in constant shear sims is those that need to assume a zero-mean shear prior.

On the technical side, there are strong reasons for having a constant shear field if you are testing shear recovery in a challenge such as this. The ease of cancelling shape noise uncertainty is one, but also allowing submissions with small tables of mean results per field, rather than large catalogues, is a big bonus if you have to run a server to process uploads and score calculation automatically. However, in my opinion the main advantage for this type of simulation though is the sensitivity to bias - it turns out you can get much better constraints on m and c-type biases when a simple shear field is applied, because your estimates of m and c can be made to be a linear combination of the individual, noisy shear estimates. My work on constructing metrics for GREAT3 has led me to conclude that for a more complex, cosmological-type shear field with zero ensemble mean, information on biases can only be derived from 2nd order moments of the shear field (e.g. 2 point statistics) and this impacts the final signal to noise of the experiment significantly.

So I concede that the constant shear field is unrealistic, but it was concluded that these aspects of unrealism were well-motivated in terms of optimized experimental design. Barring the small subset of methods which need to assume a prior (and would normally reasonably choose this to be zero mean) I do not think constant shear simulations are unfair. But perhaps you could clarify what you meant if I've misunderstood?

Shear recovery methods typically have less errors on bright galaxies.

Now I'm definitely not sure I know what you mean. The distribution of galaxy fluxes comes directly from the COSMOS galaxy images that we use as a training set, and so like real data is dominated by fainter objects. You can see a plot of the SNR distribution reported by SExtractor as FLUX_AUTO/FLUXERR_AUTO in a GREAT3 simulation image on Figure 1 (page 9) of this pdf: https://dl.dropboxusercontent.com/u/495780/GREAT3/Metrics/Why_Update_Metrics.pdf

That plot comes from running SExtractor on a variable shear branch image, but the SNR distributions are the same (we tested this and I can show more plots from constant branches if you'd like to see them.) Based on my experience in HST, CFHTLenS and DES data I think that's a pretty realistic set of galaxy SNRs from which to try and estimate the shear - there are very few bright galaxies.

There are some slight caveats of course (the way we selected these galaxies is in the Handbook if you need more detail): we put in place a size cut to remove the very smallest galaxies relative to the PSF, but this again was for reasons of experimental design. Some methods cannot produce a shear estimate in these conditions. We figured that if we had too many objects for which some methods failed totally (i.e. could not return a shear at all) then the shape noise cancellation schemes we'd put in place would start to break down and this would ruin the sensitivity in out tests. Quite a lot of thought was put into doing this in a way which caused minimal impact. Another caveat is the way that the size distribution was modified in order to match a deeper sample more typical of lensing sources: again, all details in the handbook. And, of course, in real life the few bright galaxies will have a different, typically weaker shear due to being closer in redshift: this, and redshift in ge neral, is a subtlety that we left out!

But the caveats and unrealistic aspects of the galaxy size and flux distribution I described above do not seem to be what you meant.

Have I missed something? Were you perhaps referring to something more specific as 'the constant shear branches with bright (high SNR) galaxies of orthogonal shapes'?

The above suggestion is just some of my thoughts that might be useful for the future. No offence here.

None taken. Whoever takes on the mantle of a future challenge will find much in GREAT3 they can learn from, or learn to avoid!

But we have been very careful to ensure that the challenge is, although often unrealistic, unfair in the fewest number of ways possible. Wherever we were forced into a simplification that is potentially unfair, we have tried very hard to ensure this was only when it was strongly motivated by the goals of the challenge.

These goals were to start to explore the impact of realistic galaxy morphology, multiepoch imaging, and PSF uncertainty on shear estimates for cosmology and astrophysics. We put a lot into trying to achieve just those goals. Future challenges will have (and ought to have) a different, new set of goals, which will doubtless require different decisions to be made when the time comes.

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36840457

barnabytprowe commented 10 years ago

Hi Jun,

OK, it's fine if you don't want to discuss how I misunderstood you! (I clearly do.)

But this is a public forum: I did want to make those points about the constant shear branches and (especially) about the distribution of galaxy SNRs in the simulations. It is important that people are not misled into thinking the constant shear branches contain only bright, high SNR galaxies. And I think it's also important to understand that an unrealistic constant shear field simulation can yet give a lot of useful information about shear measurement algorithms for real data.

betajzhang commented 10 years ago

Hi Barnaby,

Certainly! I don't want to mislead people neither. The constant shear branches certainly do not contain only bright galaxies. Indeed, all I want to say previously about the constant-shear branches is that each image contains "a few" pairs of very bright galaxies of "orthogonal shapes". This is not realistic because for two real galaxies, even if they have very similar shapes, their shears can't possibly be the same. This is what my first point means in the next to the last email. Secondly, do you agree that one can very accurately recover shear with "only one pair" of very bright galaxies in each image? For most shear measurement method, I don't think it is very difficult to deal with such bright sources. This is my second point. Of course, in any case, it is non-trivial to achieve a score higher than others'.

Hopefully, this answers your questions.

Best, Jun

----- 原始邮件 ----- 发件人: "Barnaby Rowe" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期四, 2014年 3 月 06日下午 7:11:02 主题: Re: [great3-public] Bootstrapping (#9)

Hi Jun,

OK, it's fine if you don't want to discuss how I misunderstood you! (I clearly do.)

But this is a public forum: I did want to make those points about the constant shear branches and (especially) about the distribution of galaxy SNRs in the simulations. It is important that people are not misled into thinking the constant shear branches contain only bright, high SNR galaxies. And I think it's also important to understand that an unrealistic constant shear field simulation can yet give a lot of useful information about shear measurement algorithms for real data.

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36845505

barnabytprowe commented 10 years ago

Hi Jun, yes, and I didn't mention it on this list but I confess after a little thought I already understood what you meant from a previous post! Could I ask: would you mind making some submissions that use all galaxies too? It would be most useful scientifically to see how those fare.

betajzhang commented 10 years ago

Hi Barnaby,

Sure. Indeed, my submission so far uses all the galaxies. The most successful submissions give the bright pairs much larger weights than the faint ones, meanwhile the faint ones does help on reducing some noise effects. I can later submit a few results with only the bright pairs. I will let you know later.

Best, Jun

----- 原始邮件 ----- 发件人: "Barnaby Rowe" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期四, 2014年 3 月 06日下午 9:51:40 主题: Re: [great3-public] Bootstrapping (#9)

Hi Jun, yes, and I didn't mention it on this list but I confess after a little thought I already understood what you meant from a previous post! Could I ask: would you mind making some submissions that use all galaxies too? It would be most useful scientifically to see how those fare.

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36889143

barnabytprowe commented 10 years ago

Ah, right I understand! Yes. Well, if it's not too much trouble it would also be very interesting to see some normally weighted submissions, i.e. the sorts of weights you would naturally use in real data which would take intrinsic shape dispersion into account. And it would even be great to have some unweighted submissions too for comparison. They might not be GREAT3 Q-value optimal, but it would be very useful for our scientific analysis after the challenge! No need to flag these up as different, just so long as after the fact we can reconstruct scores as a function of weighting scheme.

betajzhang commented 10 years ago

Hi Barnaby,

Certainly. I feel necessary to do so as you suggested. I am currently working on variable_PSF and multiepoch branches, not sure if I have time to focus on the weighting issues specifically before the end of the challenge. I am certainly willing to share all my experience with GREAT3 data sometime soon, likely at the final meeting and after. I look forward to seeing you all in May.

Best, Jun

----- 原始邮件 ----- 发件人: "Barnaby Rowe" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期四, 2014年 3 月 06日下午 11:52:19 主题: Re: [great3-public] Bootstrapping (#9)

Ah, right I understand! Yes. Well, if it's not too much trouble it would also be very interesting to see some normally weighted submissions, i.e. the sorts of weights you would naturally use in real data which would take intrinsic shape dispersion into account. And it would even be great to have some unweighted submissions too for comparison. They might not be GREAT3 Q-value optimal, but it would be very useful for our scientific analysis after the challenge! No need to flag these up as different, just so long as after the fact we can reconstruct scores as a function of weighting scheme.

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-36901694

barnabytprowe commented 10 years ago

Hi Jun,

It is of course up to you what you submit, but I think I should just be clear about what we plan to do in the period after the challenge end date, and when writing the paper.

Unless something very drastic changes, we don't currently plan to open a second deadline for post-challenge submissions specifically intended for the paper. We might request some additional info about pre-existing submissions if it helps the scientific analysis, but right now we do not plan to ask for wholly new submissions at all. Therefore if a submission is not made before the challenge deadline, it won't be included in the paper.
Much of the scientific analysis in the paper will be the comparison of methods within branches, and across branches, with a view to comparing their performance in a real data setting. Therefore we will be thinking critically about which submissions allow a meaningful comparison (in this regard) to be made, and we will be actively excluding those that don't. A very steep weighting for bright galaxies would not give practical benefits in real data, although it does in the GREAT3 constant branches, and so these submissions would very likely be excluded from all science-driven comparisons in the paper.
Where we do need to exclude a submission from these comparisons, we will of course also need to be explicit about why this was done in each case. For every submission excluded we will explain how the submission being rejected fails to match our criteria for a weak lensing science-focused comparison.

To summarise, the rules of the challenge are quite broad and there are likely a number of ways to gain an advantage beyond what would be possible in real data. However, when we write the paper we will be taking an entirely critical approach to our dataset (the submissions), and therefore carefully constructing our comparisons to ensure they are relevant to the problem of weak lensing shape measurement (not necessarily to the problem of winning GREAT3). We would like to see submissions that could be included in those comparisons from all teams, as ultimately we're all interested in measuring real shears!

betajzhang commented 10 years ago

I fully agree! -Jun

----- 原始邮件 ----- 发件人: "Barnaby Rowe" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期六, 2014年 3 月 08日上午 2:42:15 主题: Re: [great3-public] Bootstrapping (#9)

Hi Jun,

It is of course up to you what you submit, but I think I should just be clear about what we plan to do in the period after the challenge end date, and when writing the paper.

Unless something very drastic changes, we don't currently plan to open a second deadline for post-challenge submissions specifically intended for the paper. We might request some additional info about pre-existing submissions if it helps the scientific analysis, but right now we do not plan to ask for wholly new submissions at all. Therefore if a submission is not made before the challenge deadline, it won't be included in the paper.
Much of the scientific analysis in the paper will be the comparison of methods within branches, and across branches, with a view to comparing their performance in a real data setting. Therefore we will be thinking critically about which submissions allow a meaningful comparison (in this regard) to be made, and we will be actively excluding those that don't. A very steep weighting for bright galaxies would not give practical benefits in real data, although it does in the GREAT3 constant branches, and so these submissions would very likely be excluded from all science-driven comparisons in the paper.
Where we do need to exclude a submission from these comparisons, we will of course also need to be explicit about why this was done in each case. For every submission excluded we will explain how the submission being rejected fails to match our criteria for a future weak lensings science-focused comparison.

To summarise, the rules of the challenge are quite broad and there are likely a number of ways to gain an advantage beyond what would be possible in real data. However, when we write the paper we will be taking an entirely critical approach to our dataset (the submissions), and therefore carefully constructing our comparisons to ensure they are relevant to the problem of weak lensing shape measurement (not necessarily to the problem of winning GREAT3). We would like to see submissions that could be included in those comparisons from all teams, as ultimately we're all interested in measuring real shears!

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-37054035

betajzhang commented 10 years ago

Hi Barnaby,

I have a quick/naive question regarding the variable-PSF branches.

Is the following true:

Field 1 include images 000, 001, ..., 019;

Field 2 include images 020, 021, ..., 039;

...

Field 10 include images 180, 021, ..., 199;

Thanks a lot!

Best, Jun

rmandelb commented 10 years ago

Yes, you are correct.

barnabytprowe commented 10 years ago

Hi Jun,

Yes although it should be:

Field 1 include images 000, 001, ..., 019;

Field 2 include images 020, 021, ..., 039; . . .

Field 10 include images 180, 181, ..., 199;

If you want help with the bookkeeping, a lot of the basic hard work is done in the example script psf_models.py that you can get on the repo here, so you could copy that. See: https://github.com/barnabytprowe/great3-public/wiki/Public-software-associated-with-the-challenge

betajzhang commented 10 years ago

Hi Barnaby and Rachel,

In the second paragraph on page 12 of the GREAT3 paper, there is a sentence says:"this underlying PSF model is the same for all subfields within the same field, modulo a position shift (which is provided; participants do not have to estimate it)". Could you please tell me what "modulo" means exactly?

Currently, I directly use the position information of the stars provided in the star_catalog, without using any other subfield information. I feel that this is enough, but not sure if I am missing something. Thanks a lot for your time!

Best, Jun

----- 原始邮件 ----- 发件人: betajzhang@sjtu.edu.cn 收件人: "barnabytprowe/great3-public" reply@reply.github.com 发送时间: 星期二, 2014年 3 月 18日下午 8:49:52 主题: Variable PSF

Hi Barnaby,

I have a quick/naive question regarding the variable-PSF branches.

Is the following true:

Field 1 include images 000, 001, ..., 019;

Field 2 include images 020, 021, ..., 039;

...

Field 10 include images 180, 021, ..., 199;

Thanks a lot!

Best, Jun

barnabytprowe commented 10 years ago

Hi Jun,

I think this discussion covers the meaning intended: http://english.stackexchange.com/questions/70018/meaning-of-modulo-the-fact

In the context of "this underlying PSF model is the same for all subfields within the same field, modulo a position shift (which is provided; participants do not have to estimate it)"

...here "modulo a position shift" means "although a position shift is included which is a small difference you will need to take into account". The position shift being referred to is the offset of each subfield.

If you want to see this being handled in action, check out the psf_models.py example script (available in this repository and described here: https://github.com/barnabytprowe/great3-public/wiki/Public-software-associated-with-the-challenge).

rmandelb commented 10 years ago

In terms of the consequences for using the catalogs: Also note that since the galaxy and star catalogs for variable PSF branches include the true position on the sky, you can safely use those without worry.

This warning was simply meant as a reminder that one cannot use the galaxy (x, y) positions within the subfield to get the true position on the sky, since you need to add the per-subfield offset.

betajzhang commented 10 years ago

I understand better now. Thank you both!

Best, Jun

----- 原始邮件 ----- 发件人: "rmandelb" notifications@github.com 收件人: "barnabytprowe/great3-public" great3-public@noreply.github.com 抄送: "betajzhang" betajzhang@sjtu.edu.cn 发送时间: 星期二, 2014年 4 月 01日下午 8:14:42 主题: Re: [great3-public] Bootstrapping (#9)

In terms of the consequences for using the catalogs: Also note that since the galaxy and star catalogs for variable PSF branches include the true position on the sky, you can safely use those without worry.

This warning was simply meant as a reminder that one cannot use the galaxy (x, y) positions within the subfield to get the true position on the sky, since you need to add the per-subfield offset.

Reply to this email directly or view it on GitHub: https://github.com/barnabytprowe/great3-public/issues/9#issuecomment-39198145

barnabytprowe / great3-public

Bootstrapping #9