Closed rmjarvis closed 8 years ago
It seems like these could be enabled with a flag exclude_marginalmaybe? We already have exclude_fail and exclude_bad which presumably remove the more extreme bad objects.
An alternate way to do this might be to have a single flag called something like exclusion_level
which has several options: "marginal", "bad", "failures" (or whatever). I think that in general there will be little use for combinations like exclude_marginal=True
and exclude_fail=False
, so having a single flag with multiple options could be the simplest way to cover all the cases that people might use in reality.
However...
Really there are three classes of cuts that we used in GREAT3:
(1) Cuts to identify failures with galaxy selection and/or postage stamp generation.
(2) Cuts to identify issues with the parametric fits (including the tests currently in exclude_bad
and exclude_fail
).
(3) SNR and resolution cuts that were used for GREAT3.
I guess that if people are only using the HST images, they might want to impose cuts just on (1), not (1) and (2). Very often the failures in (2) are problematic precisely because there's an issue with the postage stamp generation (so they would fail using (1) as well), but there are also some failure modes for the Sersic fits that show up in (2).
So perhaps there should just be two cut flags: exclude_ps
(for postage-stamp generation failures), and exclude_fits
(for failures or signatures of badness in the parametric fits, currently split into two flags), plus perhaps separate options for S/N or size cuts.
Note, I re-thought my above comment and edited on GitHub, so please read there instead of the e-mail notification.
I like the idea of a single exclusion_level
kwarg that takes different values. Presumably the parametric fits are always (at least presumed to be) bad when there are postage stamp problems, right? So I think the ps
and fits
exclusions probably still qualify as multiple levels for a single parameter. Even if not, we can still have exclusion_level
s that include 'none', 'bad_images', 'bad_fits', 'all_bad', 'all_marginal'. That feels like a nice way to specify this that probably matches how people will be thinking about what they want excluded.
Rachel, would you be up for taking a first stab at trying to code this up?
I'm having trouble following the code that does this in the great3 scripts. Especially the calculations related to the simulation noise level and seeing. Also, there seem to be a lot of magic numbers in here, and I can't easily tell which ones are just heuristic (X because anything smaller is probably junk) and which are Great3-specific (X because we chose this other value to have a range from Ymin to Ymax, which implies X here). So if you could code up the easy ones and then point out which ones will be tricky to implement this way, that would be very helpful. :)
Hi all - I had the same general problems as Mike when trying to make the great3 scripts more configurable in terms of, e.g., setting the min galaxy S/N. Having things done as Mike describes would be really great!
Mike, what are your time constraints for getting this code in place (if any)? I am traveling this week and seriously swamped from now until the end of the semester, so if you want this code in place really soon, then it's probably best if someone else does it (although I can answer questions about any lines of code you find confusing in great3-public scripts). If any time in the next ~6 weeks is fine, then I can definitely do it after classes end.
The ~6 week time scale would be perfect. Thanks!
I think this will be useful to have in place for DESC DC1 sims, if that helps give you a relevant time frame. Not anything I need before year end. (We're also interested in this for DES sims, but I think the time frame is similar.)
OK, in that case I assigned myself. I will do my usual annoying thing that I do every December (write code over winter break and make a PR when there's nobody around to look at it).
Hi @rmjarvis and @beckermr - I'm doing that annoying thing I do where I try to finish all my coding during winter break since there isn't enough time during the semester. I sat down and went over this stuff carefully (GREAT3 simulation scripts, current COSMOSCatalog) to try to come up with a preliminary design that addresses the questions of which cuts should be included, in what way, and how to get the information needed to include them. Here is my proposal, which I realize you might not have time to read for a while, but I wanted to get it down while I'm thinking about it. It introduces some questions / concerns about compatibility between new versions of GalSim and old catalogs. Sorry this is so long, but I wanted to be clear.
There are a bunch of selection criteria in the GREAT3 simulation scripts, which do the following:
to_use
flag cut: eliminates galaxies for which we don't have an estimate of the shape or apparent resolution with a particular PSF, parametric fit failures.dmag
cut: this eliminates galaxies for which the apparent magnitudes of the COSMOS galaxy and the parametric fits differs by some amount (we used 1). Often these were because of blend issues or masking issues, though in some cases they represent actual issues with the parametric fits.flux_frac
cut: this eliminates galaxies for which (in a preprocessing step) we have determined that the GREAT3-size postage stamp will, for a given PSF, include too little of the galaxy flux. We don't want to chop off the edges of galaxy light profiles.approx_sn_gal
cut: again, in the spirit of inclusiveness, we did some pre computations to ensure that for a given noise variance and PSF, the total S/N of the light profile with an optimal S/N estimator would be within some range.noise_max_var
cut: basically making sure we had a valid estimate of the noise variance at which the S/N would be 20, so we could make cut (5).do_meas
cut: require a valid fit. Seemingly redundant with (1).original_sn
cut: require that in the initial postage stamp, the S/N of the detection was >20. This helps get rid of masking failures and detection issues when cutting out the postage stamps.noise_min_var
cut: we use pre computations of the minimum noise variance that could be achieved after simulating and then noise whitening, in order to eliminate those that have a noise variance that is higher than the target one.mask_cond
cut: this cut is meant to eliminate postage stamps that have masking and deblending failures of various sorts.exclude_fail
flag: covers most of (1) and (7).exclude_bad
flag: certain cuts on the fits to eliminate suspicious ones (could have sky estimation issues), not included in any of (1)-(11).In my view, it would be very hard to implement (3)-(6), (8), or (10) in GalSim. They generally depend on things like the target PSF, pixel scale, and/or noise variance. In GREAT3 we implemented them by pre-computing various quantities for all galaxies, using the fact that we had a limited set of target PSFs, pixel scales, and noise variances. In GalSim, when allowing people complete freedom in these aspects of the simulations, I don't think it'll be possible to implement them as a pre-selection. We could provide an option to analyze a simulated image to check whether it seems to pass some of these cuts after the fact, but that's the only way i can think to do this. And once the sim is already made, this seems less useful. Thoughts / objections?
exclusion_level
Assuming we're going with an exclusion_level
keyword to determine the level of selection criteria to impose, here are my thoughts on the options:
None
: no description needed :)bad_ps
: I think (9) and (11) are the least ambiguous of the options to be used to cut out only things that have a bad postage stamp. The other cuts are mixtures of possible badness in fits vs. postage stamps.bad_fits
: Should carry out the current exclude_fail
cuts.marginal
: Should carry out all cuts in bad_ps
and bad_fits
, plus (2) and the current exclude_bad
cuts.I admit the line between "marginal" and "bad" is a bit blurry, so I'm welcome to alternative suggestions.
In thinking about implementation I reminded myself why I didn't do (2), (9), or (11) in the first place: they depend on additional selection information in files we distributed in GREAT3, NOT in the GREAT3 real galaxy catalog or the parametric galaxy model catalogs. A current GalSim user literally does not have the information to do these cuts.
So that leads me to conclude that we need to either
(a) distribute a new file that can enable a user to do (2), (9), and (11), or
(b) include the information for (9) and (11) in the RealGalaxyCatalog, and for (2) in the parametric models catalog. (I suggest this split because the first two have to do with the real images and the latter has to do with the fit.)
Initially (b) appealed to me the most, because once we start requiring tons of different files for different purposes, it seems really unwieldy. We could include the information in new versions of the files on the GREAT3 server. But then it does mean that anyone who already has those catalogs won't be able to use these cuts. We'd need to detect that issue and raise an exception telling them to download new versions, which is not very nice. So then I thought okay, we have to do (a). We'll still have to tell people they need to download something new (which could be added to the tar balls as well, so people who do a new download of the COSMOS catalog get them automatically) but at least the old functionality with the old catalogs is preserved.
I'm not loving the conclusions I'm coming to here, as the whole COSMOS catalog class usage seems to be getting more and more unwieldy. Alternate ideas would be welcome, or perhaps you could just persuade me to dislike these solutions a bit less. :)
Possible implementations for the tough ones:
As for the how to distribute extra information question, I think it would be fine to add more files to the tarball that we can use for this. We could even have the galsim_download_cosmos script check for these new ones separately and only download them if they already have the current files.
For 3) above, would it not be the case that in most applications that mimic real data, one would pick a stamp size that contains most of the object's flux instead of cutting objects that are too extended for a given stamp? Output formats like MEDS would be able to support the variable stamp sizes easily and it might be useful to have this option.
Certainly there are different use cases. We should enable both.
Or in fact a third option
The last is what we did for the end to end sims. I rejected objects that required stamp sizes larger than 128x128 IIRC. Not too many that big, and if you try to render them, they dominate the running time.
Yes this sounds great. The last detail is to choose FFT-friendly sizes, like powers of 2 with a few small prime factors like 3 or 5.
I just realized this is probably already in the MEDS code in galsim.
If you let GalSim pick a stamp size for you, it will choose 2^n or 3x2^n. But currently the galsim meds option requires a designated stamp size, so there would still be a bit of work to enable the variable stamp size based on the object.
The current automatic stamp size choice is rather conservative, erring on the side of larger stamp sizes than are actually necessary. And the target flux fraction is 99.5% (tunable using GSParams folding_threshold). So we should try to make this a bit more flexible to easily allow for other choices of flux_frac.
@rmandelb I think a good project for you on this branch would be to propose a version of cgc.yaml that would include all of the cuts that Great3 did. I'm happy to work on implementing the back-end code that will make it all work.
You can use the new top-level stamp field if you want for some of them. e.g.
stamp:
min_flux_frac : 0.99
might be an appropriate place to put the minimum flux fraction falling in each stamp.
OK, lots of discussion on here. Let me try to make a single summary response:
Three: This was mainly an issue in GREAT3 because we had decided a priori on postage stamp sizes. I like the "you decide on min_flux_frac
and it chooses a stamp size, but rejects those that would require one that is too large" idea. Do I understand correctly that you're willing to try to implement that?
Four: Resolution was calculated based on apparent size with respect to the PSF, which depends primarily on half-light radius and sersic index (and very weakly but non-negligibly on ellipticity, which is of course not good). In principle, I could try to see if the resolution cut corresponds to a simple cut in the HLR vs. sersic n plane, and implement the cut in GalSim based on the HLR and n values. Let me play around with this.
Five: I like your S/N idea. We might want to give some thought to S/N estimators though (after our realization midway through GREAT3 that realistic estimators deviate a lot from the ideal).
Eight: That was for ellipticity estimation failures only (e.g., non-convergence of adaptive moments) and affected a tiny fraction of objects. I think if we ignore it in GalSim, it's not a big deal.
Ten: sounds good.
@rmjarvis , to address your comment:
As for the how to distribute extra information question, I think it would be fine to add more files to the tarball that we can use for this. We could even have the galsim_download_cosmos script check for these new ones separately and only download them if they already have the current files.
OK, but it sounds like we might be converging towards a solution that doesn't require it (if we can do (4) using HLR and n).
And your suggestion:
I think a good project for you on this branch would be to propose a version of cgc.yaml that would include all of the cuts that Great3 did.
Sure, I am up for that.
Looks like you were bitten by the Mardown start-numbering-at-1 bug "feature". :)
Do I understand correctly that you're willing to try to implement that?
Yes. I think it shouldn't be too hard. We've already got the basics in there with folding_threshold in GSParams and the automatic stamp size based on that. And the skeleton of the code to reject objects that fail some criteria is already there with the retry_failures stuff. I'm thinking that the main thing will be to make retrying failures the default with some largish number of retries (like 20 or 30 just to prevent infinite loops). Then things like having the automatic stamp size be larger than the maximum would just be a kind of failure that would trigger a retry.
Resolution was calculated based on apparent size with respect to the PSF
By size, I guess you don't mean hlr. Was this a moment-based size? T=Ixx+Iyy maybe? If so, we can do this the same way by letting the user set a minimum T to allow. Helpful if we could also tell COSMOSCatalog a min_hlr on construction to limit the number of retries required. i.e. find some hlr below which everything would be a resolution failure, so we can cut them out from the start and not have to build the stamp and check if it's ok. Likewise with the S/N -- helpful if we could set some min_flux below which everything would have too low a measured S/N.
OK, but it sounds like we might be converging towards a solution that doesn't require it (if we can do (4) using HLR and n).
I thought the extra info was for (2),(9),(11). Not (4). Am I missing something?
Looks like you were bitten by the Mardown start-numbering-at-1 bug "feature". :)
Curses! I fixed it by editing it so it's words instead of numbers. Sigh.
Yes. I think it shouldn't be too hard. We've already got the basics in there with folding_threshold in GSParams and the automatic stamp size based on that.
Yes, that's what I was thinking.
By size, I guess you don't mean hlr. Was this a moment-based size? T=Ixx+Iyy maybe?
Yes, compared to the same trace for the PSF image. That's the tricky bit - you can't calculate it from the image postage stamp alone.
So for GREAT3 I tabulated these values for the galaxies for a single space-based PSF and 5 Kolmogorov FWHM values, then interpolated between them. For GalSim I was thinking of trying to make a fitting function for those resolution values as a function of the intrinsic HLR, Sersic n, and PSF size. Unless you have other ideas that get around the fact that a PSF image is needed?
Helpful if we could also tell COSMOSCatalog a min_hlr on construction to limit the number of retries required. i.e. find some hlr below which everything would be a resolution failure, so we can cut them out from the start and not have to build the stamp and check if it's ok.
Well if people want to have something like a min_resolution
cut, couldn't GalSim use the fitting function for the resolutions to decide which things to cut in advance?
Unless you have other ideas that get around the fact that a PSF image is needed?
Issue #308 comes to mind for this.
Well if people want to have something like a min_resolution cut, couldn't GalSim use the fitting function for the resolutions to decide which things to cut in advance?
I'm not sure how useful that fitting function would be in the fully general case. We want to allow all sorts of PSF and Galaxy profiles. Not just Sersics. Better to define what the resolution cut means and then implement that as a potential stamp failure I think. In which case, it would be helpful for the specific case of the Great3 scripts to also have a min_hlr specification to cut out most of the objects that would fail before the stamp creation stage.
Issue #308 comes to mind for this.
So are you thinking of using a resolution defined in terms of pre-seeing light profile? (which I think you were using then) Or something that mimics the GREAT3 resolution, which is based on the post-seeing apparent size, and would involve defining the function such that it draws whatever images it needs to get the resolution if it can't compute it analytically? I'm not sure how GREAT3-like you are trying to get.
I just mean the ability to calculate a hlr or Irr for any arbitrary profile. Then we can have a post-seeing resolution test as one of these stamp failure tests to mimic what Great3 did.
I'd like to enable doing something as close as possible to what Great3 actually did, since this is one real-life use case. Obviously, not everyone will want to make the same choices, but it seems like we should at least enable these simulations in a fairly straightforward way.
OK. If we want to do this in generic a way as possible, then I think #308 is the way to go.
Are you willing to / interested in implementing #308, and I will (a) put together a routine that uses that functionality to construct a resolution factor, and (b) work on the yaml file?
To summarize my current plans based on what I originally proposed and the iteration that has happened since then, and track progress (note, updated based on Mike's response):
exclusion_level
keyword that works as described in https://github.com/GalSim-developers/GalSim/issues/693#issuecomment-167173980Please let me know if you think there is something I'm missing or gotten wrong.
Here is what I think Mike volunteered to do on this branch:
exclusion_level
.)I'm unable to complete this during this week due to family plans, but next week things will still be pretty quiet around here, and I plan to finish then.
Sounds great! I can also do the modifications to galsim_download_cosmos if you want, since I wrote that in the first place.
Okay, glad we agree.
I can also do the modifications to galsim_download_cosmos if you want
Sold! I'll edit the comment and put that on your list. :)
@rmjarvis - I wanted to give you the info you need to make the modifications to galsim_download_cosmos. Thanks to Joe (who helped get these publicly available on the GREAT3 server), there are new files up on the GREAT3 server:
Please let me know if there's anything else you would need to do
@rmjarvis and @beckermr , you are welcome to try out the code for adding more selection criteria on the COSMOSCatalog class to match some of the GREAT3 cuts. It is included as of this commit: https://github.com/GalSim-developers/GalSim/commit/d20593ee71ecad6abf121d4fff58d93238ca5d28 . Suggestions for changes to make are welcome.
Finally: I want to work on the modifications to cgc.yaml. I believe that the two things that need to be done there are
(a) make sure that the COSMOSCatalog has the right exclusion_level
(done here: https://github.com/GalSim-developers/GalSim/commit/f028d825af94d876cec6a1b8fc43c1baed672d94 )
(b) put in something that corresponds to cuts 3=flux fraction cut, 4=resolution cut, 5=approximate S/N cut. (As I think we agreed, the noise variance after whitening issue -- number 10 on the list -- is handled by the "choose another object" code in config; and the "has a valid ellipticity estimate for B-mode shape noise cancellation" -- number 8 -- eliminated very few objects, so I think we should just ditch this cut.)
For (3), I like what you suggested in https://github.com/GalSim-developers/GalSim/issues/693#issuecomment-167373178 but I am not sure that I did this right: see https://github.com/GalSim-developers/GalSim/commit/ad416743ffc77a3d2a5dede1aec58440ab41e9f5 (I couldn't find any examples that used the new top-level stamp field, so I basically plonked it in there at random)
For (4), how were you thinking of implementing this? For GREAT3, we had a very specific trace-based resolution definition. But I can imagine people using the trace or determinant radius or area, either for pre-seeing or post-seeing radii. In light of all the possible combinations, I was wondering if the resolution cuts could/should be implemented using eval to define the exact expression? I didn't do anything for this yet, since I am curious what you were thinking of doing.
For (5), there is a question of how to define the S/N. One could use an idealized S/N estimator (which is overly optimistic by a factor of ~2), a S/N estimator that is based on the flux in an elliptical Gaussian (also overly optimistic, but not as bad, and it's more robust than the former in the case of input images that were noisy to begin with). For GREAT3 we made one particular choice, but I think it would be nice to enable multiple options. What do you think?
For 4 and 5, it might be most appropriate to just define a custom type that would do the calculation that Great3 did. Then people could see how to modify this to do something different, but we wouldn't be weighted down by lots of complicated options in the config definitions. The config processing could just look for a stamp.reject
field and parse it to see if the stamp should be rejected. So:
modules:
- great3_reject # We would define the code to decide whether to reject a stamp in great3_reject.py
...
stamp:
reject: # Expecting a bool value
type: Great3Reject # A custom type that would do the appropriate checks and return whether the drawn stamp should be rejected.
OK... so I'd just define the Great3Reject
type in great3_reject.py
in examples/great3/ ? And the idea is that this class would be initialized with all the basic numbers that define the cuts, and then when it's called it would take a stamp as an input and return a boolean value?
The resolution cut would also need to know what the PSF looks like. How would that work?
Rachel, @esheldon pointed out to me that the exclude_fail
check, now bad_fits
, removes galaxies with a bad bulgefit status as well as a bad sersicfit status. But when we make a parametric galaxy, I think we start with B+D and revert to the Sersic if B+D was bad.
So do we want to keep this check in the constructor? Are objects with bad bulgefit status suspect regardless of whether we could switch to the Sersic fit? Or maybe this check should be moved to the marginal
exclusion level now?
OK... so I'd just define the Great3Reject type in great3_reject.py in examples/great3/ ? And the idea is that this class would be initialized with all the basic numbers that define the cuts, and then when it's called it would take a stamp as an input and return a boolean value?
You can see an example custom module that uses hsm to measure the shape of a drawn object here. I use it in a DES example script for the output truth values of the PSF shape. It's a bit gratuitous, but it's intended to serve as an example of the kind of thing we're doing here with writing custom types outside of GalSim.
Here is a strawman of the code in great3_reject.py:
def Great3Reject(config, base, value_type):
"""See if a drawn postage stamp should be rejected based on Great3 cuts.
"""
stamp = base['current_stamp']
gal = base['gal']['current_val']
psf = base['psf']['current_val']
reject = False
# Check the flux
flux = numpy.sum(stamp.array)
if flux < 0.99 * gal.flux:
reject = True
# ... Other checks
return reject
galsim.config.RegisterValueType('Great3Reject', Great3Reject, [ bool ])
The resolution cut would also need to know what the PSF looks like. How would that work?
I guess this could be incorporated into the above too. Unless we can know whether we want to cut it before the stamp is drawn.
So do we want to keep this check in the constructor? Are objects with bad bulgefit status suspect regardless of whether we could switch to the Sersic fit?
Good question.
First of all, for context, there are 56062 objects in the catalog. Of these, 550 have a bad bulge fit status, 4 have bad values for both sersicfit and bulge fit, and none have a bad sersicfit status but a good bulge fit status. Basically the bulge fit one is more likely to fail to converge at low S/N because it has more degrees of freedom.
My suggestion: let's change bad_fits
it to exclude those that have a bad status for both, while marginal
excludes those that have a bad status for either. Do you agree? If so, I will make that change.
Re: the GREAT3Reject code, thanks for the example and the straw man code. I think I can take it from here. But -
(1) I thought that the 99% of flux thing was going to be taken care of using min_flux_frac
(which you had suggested could be included in the stamp field), not using this code? Or do you think once you're letting config reject objects using some user-specified routine, we should make users specify this in their own code? My take was that people might indeed want to do this in config without specifying any other unusual cuts, so it's worth having in config.
(2) And yes, we can definitely incorporate the resolution cut into the Great3Reject code. I hadn't realized it would be able to easily access the PSF image, but once it can, this is easy to do.
I think that makes sense. Then bad_fits
means that we can't make a parametric galaxy, and marginal means something might be screwy, but not catastrophic.
I thought that the 99% of flux thing was going to be taken care of using min_flux_frac (which you had suggested could be included in the stamp field), not using this code?
Sure. I think that's one we can implement directly. So no need to include it as part of Great3Reject.
I hadn't realized it would be able to easily access the PSF image
config['psf']
is actually the profile, not the image. We don't normally render the psf image. But we can have this code render it and stash it somewhere. Then only rerender it if the psf changes. e.g.
if 'cached_psf_image` in base and psf is base['cached_psf']:
psf_image = base['cached_psf_image']
else:
psf_image = psf.drawImage()
base['cached_psf_image'] = psf_image
base['cached_psf'] = psf
The cuts Mike introduced remove about 20% of the galaxies that make it into great3 in the rgc branch
My choices of max_hlr and max_flux in the cgc.yaml file were pretty arbitrary. So, I wouldn't be at all surprised if it cuts out way too many galaxies now.
Once the work on this branch is done, we should be able to have a selection that is closer to what Great3 did, and more importantly, easy to modify to something slightly different if we want.
I just pushed the changes we discussed yesterday. I'm sure that I did things in a very silly way in Great3Reject; I wasn't sure what was the best way to access certain information. Can you check it out and see what you think?
OK. Back to you Rachel. I haven't done galsim_download_cosmos yet. And I made it just a warning if the user doesn't have that file rather than an error. So it skips the checks that require it if it's not there.
But I think the reject stuff is ready for you to play with a bit. galsim cgc.yaml should work now. The following will give you a reasonable-looking output that will Ctrl-C break nicely:
galsim -n200 -j1 cgc.yaml image.nproc=1 output.nproc=1 output.noclobber=False
You'll see that there are a lot of rejections from the resolution cut. So worth double checking if this is implemented correctly. If it is, then we should probably try to put some kind of pre-cut using the min_hlr and/or min_flux options (which I've turned off now) so that most objects aren't being rejected.
Also, we need a version of real_galaxy_galsim_selection.fits.gz for the unit tests. We don't want to force people to download the full version just to run unit tests, so we should have a smaller version appropriate to use with real_galaxy_catalog_example.fits in GalSim/examples/data.
I don't have time to do this at the moment (will try tomorrow) but perhaps I can address the rejection issue anyway. First of all:
You'll see that there are a lot of rejections from the resolution cut.
Is it the resolution cut specifically, or is it both cuts in Great3Reject? (there's an S/N cut too)
I happen to have some notes on what fraction of the parent sample gets used as a function of PSF FWHM in CGC in GREAT3: for FWHM of ~0.6, ~0.7, and ~0.8, we use 17k, 13k, and 10k galaxies out of the original 56k. Is this rejection rate inconsistent with what you are seeing with this script?
If it seems inconsistent, then I'll dig into the script to see what's going wrong. If it's consistent, then I will try to come up with some cut we can use to pre-select objects.
No, that could be right. Most galaxies are rejected several times before getting something that works. So if we are expecting only a ~20% success rate, that's probably fine.
Did you happen to record the hlr and flux of the ones that you did use? If so, we can precut on the min/max of these and then hopefully there will be fewer failures after drawing the stamps, so less inefficient.
OK. For the record, the S/N cut is pretty important for this as well, not just the resolution cut.
I have a test version of the great3 sim script that I run when I wanted to quickly debug things. I just checked and was able to reproduce the numbers I gave above. I could hack it to print out the kind of information you are requesting. Will try to do it in the next few days (sorry, a bit hectic trying to finish a few things before leaving for Taiwan on Sunday, but it might be a good thing to do on the plane...)
Also, we need a version of real_galaxy_galsim_selection.fits.gz for the unit tests. We don't want to force people to download the full version just to run unit tests, so we should have a smaller version appropriate to use with real_galaxy_catalog_example.fits in GalSim/examples/data.
Since the full tarball has an unzipped version of this file, ok if we just use that? I can have galsim_download_cosmos unzip this file after downloading when if puts it is the share directory. Then we can remove the double check in scene.py for the .fits.gz and then .fits. Probably easier to just have the single check for real_galaxy_galsim_selection.fits. Sound ok?
Rachel, I'm seeing some differences between the things in the tarball currently available here and the files that I currently have in my share/galsim/COSMOS_23.5_training_sample directory. Aside from the new selection file that is, which I thought was going to be the only change.
So it seems like maybe you packaged up the wrong files to go along with the new selection catalog? Maybe an older version?
You are absolutely right. Ugh, that is the bad thing about saving the old and new version in directories with nearly the same name.
I am pretty tired now and don't want to screw up (again) so I will fix in the morning. Thanks for finding this.
No rush. :)
OK, I just fixed it. @joezuntz and I have this complicated process that involves me making the file available to him and then he puts it on the server, so hopefully he'll post on here when it's replaced.
To address other things:
1)
Since the full tarball has an unzipped version of this file, ok if we just use that? I can have galsim_download_cosmos unzip this file after downloading when if puts it is the share directory. Then we can remove the double check in scene.py for the .fits.gz and then .fits. Probably easier to just have the single check for real_galaxy_galsim_selection.fits. Sound ok?
I'm fine with that.
2) I believe my outstanding action items are (a) make a small selection file available in examples/data/ to use for unit tests, and (b) suggest reasonable pre-selection criteria for cgc.yaml. I plan to do that while I am traveling this weekend, will keep you posted. Please let me know if there's anything that I'm missing.
OK. For the record, the S/N cut is pretty important for this as well, not just the resolution cut.
Here are some stats. I let one file run to completion. So 5000 total objects tested. (10,000 drawn, but half are just 90 degree rotations of the other half, so only 5000 were tested for rejection.)
So resolution is the biggest category of rejections. But SNR > 100 is probably an important one too, since I bet these are larger objects that take more time for their FFTs.
And a total of 40771 stamps were drawn with a success rate of 5000/40771 = 12% which sounds compatible with your numbers above, since this file (the first one) happened to have a particularly large PSF fwhm = 0.97 arcsec.
I believe my outstanding action items are (a) make a small selection file available in examples/data/ to use for unit tests, and (b) suggest reasonable pre-selection criteria for cgc.yaml. I plan to do that while I am traveling this weekend, will keep you posted. Please let me know if there's anything that I'm missing.
I think that's right. The above numbers are the reason why I think we should have some kind of helpful pre-selection to reduce the number of rejections.
The COSMOS galaxy selection for Great3 included a lot of different cuts from the full catalog. The code where they do this is here. We now have a convenient class in GalSim,
COSMOSCatalog
that can select galaxies from the same catalog. However, it is currently difficult to match the same cuts that were used in Great3.There are some cuts made in the
COSMOSCatalog
constructor (here), but not the same ones, and certainly not nearly as many. It would be nice if we could at least enable doing all the cuts that Great3 made with a few (some optional, some default) parameters in either the construction of theCOSMOSCatalog
class or when selecting individual galaxies from it (or a combination of both).Most of the Great3 cuts are billed as basic goodness checks. e.g. dmag between real and parametric galaxies, fraction of flux in mask, minimum resolution, etc. It seems like these could be enabled with a flag
exclude_marginal
maybe? We already haveexclude_fail
andexclude_bad
which presumably remove the more extreme bad objects. I don't have a strong opinion about whether this should default to True, but I'd like to at least make it easy to apply this if desired. Or, if there are different classes of exclusions here, it might make sense to have more than one kwarg to give people more control if they want.Some cuts are specific to the Great3 simulation, so probably ought not be the default. But I think they can be enabled with some parameters like
sn_min
,sn_max
that the user can provide as appropriate for the simulation they are making. I'm not sure which others besides the S/N ones are of this type.It's also possible that some cuts are more appropriately done in the
makeGalaxy
function. This wasn't how Great3 did it, but it might make sense ifindex is None
to allow things likemax_hlr
,min_hlr
,max_flux
,min_flux
, etc. at this point. The code could get an object at random, and then if it fails any of these checks, pull again.Pinging @rmandelb @beckermr who have already begun some discussion of this re commit d931338.