GalSim-developers / GalSim

The modular galaxy image simulation toolkit. Documentation:
http://galsim-developers.github.io/GalSim/
Other
223 stars 105 forks source link

Add Great3 cuts (at least optionally) to COSMOSCatalog class #693

Closed rmjarvis closed 8 years ago

rmjarvis commented 8 years ago

The COSMOS galaxy selection for Great3 included a lot of different cuts from the full catalog. The code where they do this is here. We now have a convenient class in GalSim, COSMOSCatalog that can select galaxies from the same catalog. However, it is currently difficult to match the same cuts that were used in Great3.

There are some cuts made in the COSMOSCatalog constructor (here), but not the same ones, and certainly not nearly as many. It would be nice if we could at least enable doing all the cuts that Great3 made with a few (some optional, some default) parameters in either the construction of the COSMOSCatalog class or when selecting individual galaxies from it (or a combination of both).

Most of the Great3 cuts are billed as basic goodness checks. e.g. dmag between real and parametric galaxies, fraction of flux in mask, minimum resolution, etc. It seems like these could be enabled with a flag exclude_marginalmaybe? We already have exclude_fail and exclude_bad which presumably remove the more extreme bad objects. I don't have a strong opinion about whether this should default to True, but I'd like to at least make it easy to apply this if desired. Or, if there are different classes of exclusions here, it might make sense to have more than one kwarg to give people more control if they want.

Some cuts are specific to the Great3 simulation, so probably ought not be the default. But I think they can be enabled with some parameters like sn_min, sn_max that the user can provide as appropriate for the simulation they are making. I'm not sure which others besides the S/N ones are of this type.

It's also possible that some cuts are more appropriately done in the makeGalaxy function. This wasn't how Great3 did it, but it might make sense if index is None to allow things like max_hlr, min_hlr, max_flux, min_flux, etc. at this point. The code could get an object at random, and then if it fails any of these checks, pull again.

Pinging @rmandelb @beckermr who have already begun some discussion of this re commit d931338.

rmandelb commented 8 years ago

It seems like these could be enabled with a flag exclude_marginalmaybe? We already have exclude_fail and exclude_bad which presumably remove the more extreme bad objects.

An alternate way to do this might be to have a single flag called something like exclusion_level which has several options: "marginal", "bad", "failures" (or whatever). I think that in general there will be little use for combinations like exclude_marginal=True and exclude_fail=False, so having a single flag with multiple options could be the simplest way to cover all the cases that people might use in reality.

However... Really there are three classes of cuts that we used in GREAT3: (1) Cuts to identify failures with galaxy selection and/or postage stamp generation. (2) Cuts to identify issues with the parametric fits (including the tests currently in exclude_bad and exclude_fail). (3) SNR and resolution cuts that were used for GREAT3.

I guess that if people are only using the HST images, they might want to impose cuts just on (1), not (1) and (2). Very often the failures in (2) are problematic precisely because there's an issue with the postage stamp generation (so they would fail using (1) as well), but there are also some failure modes for the Sersic fits that show up in (2).

So perhaps there should just be two cut flags: exclude_ps (for postage-stamp generation failures), and exclude_fits (for failures or signatures of badness in the parametric fits, currently split into two flags), plus perhaps separate options for S/N or size cuts.

rmandelb commented 8 years ago

Note, I re-thought my above comment and edited on GitHub, so please read there instead of the e-mail notification.

rmjarvis commented 8 years ago

I like the idea of a single exclusion_level kwarg that takes different values. Presumably the parametric fits are always (at least presumed to be) bad when there are postage stamp problems, right? So I think the ps and fits exclusions probably still qualify as multiple levels for a single parameter. Even if not, we can still have exclusion_levels that include 'none', 'bad_images', 'bad_fits', 'all_bad', 'all_marginal'. That feels like a nice way to specify this that probably matches how people will be thinking about what they want excluded.

Rachel, would you be up for taking a first stab at trying to code this up?

I'm having trouble following the code that does this in the great3 scripts. Especially the calculations related to the simulation noise level and seeing. Also, there seem to be a lot of magic numbers in here, and I can't easily tell which ones are just heuristic (X because anything smaller is probably junk) and which are Great3-specific (X because we chose this other value to have a range from Ymin to Ymax, which implies X here). So if you could code up the easy ones and then point out which ones will be tricky to implement this way, that would be very helpful. :)

beckermr commented 8 years ago

Hi all - I had the same general problems as Mike when trying to make the great3 scripts more configurable in terms of, e.g., setting the min galaxy S/N. Having things done as Mike describes would be really great!

rmandelb commented 8 years ago

Mike, what are your time constraints for getting this code in place (if any)? I am traveling this week and seriously swamped from now until the end of the semester, so if you want this code in place really soon, then it's probably best if someone else does it (although I can answer questions about any lines of code you find confusing in great3-public scripts). If any time in the next ~6 weeks is fine, then I can definitely do it after classes end.

rmjarvis commented 8 years ago

The ~6 week time scale would be perfect. Thanks!

I think this will be useful to have in place for DESC DC1 sims, if that helps give you a relevant time frame. Not anything I need before year end. (We're also interested in this for DES sims, but I think the time frame is similar.)

rmandelb commented 8 years ago

OK, in that case I assigned myself. I will do my usual annoying thing that I do every December (write code over winter break and make a PR when there's nobody around to look at it).

rmandelb commented 8 years ago

Hi @rmjarvis and @beckermr - I'm doing that annoying thing I do where I try to finish all my coding during winter break since there isn't enough time during the semester. I sat down and went over this stuff carefully (GREAT3 simulation scripts, current COSMOSCatalog) to try to come up with a preliminary design that addresses the questions of which cuts should be included, in what way, and how to get the information needed to include them. Here is my proposal, which I realize you might not have time to read for a while, but I wanted to get it down while I'm thinking about it. It introduces some questions / concerns about compatibility between new versions of GalSim and old catalogs. Sorry this is so long, but I wanted to be clear.

Overview of selection criteria in the GREAT3 simulation scripts

There are a bunch of selection criteria in the GREAT3 simulation scripts, which do the following:

  1. The to_use flag cut: eliminates galaxies for which we don't have an estimate of the shape or apparent resolution with a particular PSF, parametric fit failures.
  2. The dmag cut: this eliminates galaxies for which the apparent magnitudes of the COSMOS galaxy and the parametric fits differs by some amount (we used 1). Often these were because of blend issues or masking issues, though in some cases they represent actual issues with the parametric fits.
  3. The flux_frac cut: this eliminates galaxies for which (in a preprocessing step) we have determined that the GREAT3-size postage stamp will, for a given PSF, include too little of the galaxy flux. We don't want to chop off the edges of galaxy light profiles.
  4. The resolution cut: this eliminates galaxies for which the light profile, for a given PSF and pixel size, will be too small compared to the PSF. GREAT3 was aiming to be inclusive, i.e. most methods would choose to use most galaxies (so we can take advantage of shape noise cancellation) and that dictated some simple cuts like these.
  5. The approx_sn_gal cut: again, in the spirit of inclusiveness, we did some pre computations to ensure that for a given noise variance and PSF, the total S/N of the light profile with an optimal S/N estimator would be within some range.
  6. The noise_max_var cut: basically making sure we had a valid estimate of the noise variance at which the S/N would be 20, so we could make cut (5).
  7. The do_meas cut: require a valid fit. Seemingly redundant with (1).
  8. The total ellipticity cut: during pre computations, we had to be able to estimate the shape of the galaxy (trivial for single Sersic models, less so for bulge+disks) in order to carry out B-mode shape noise cancellation in variable shear branches.
  9. The original_sn cut: require that in the initial postage stamp, the S/N of the detection was >20. This helps get rid of masking failures and detection issues when cutting out the postage stamps.
  10. The noise_min_var cut: we use pre computations of the minimum noise variance that could be achieved after simulating and then noise whitening, in order to eliminate those that have a noise variance that is higher than the target one.
  11. The mask_cond cut: this cut is meant to eliminate postage stamps that have masking and deblending failures of various sorts.

Cuts that are currently in GalSim

Cuts that would be very hard to implement in GalSim in general

In my view, it would be very hard to implement (3)-(6), (8), or (10) in GalSim. They generally depend on things like the target PSF, pixel scale, and/or noise variance. In GREAT3 we implemented them by pre-computing various quantities for all galaxies, using the fact that we had a limited set of target PSFs, pixel scales, and noise variances. In GalSim, when allowing people complete freedom in these aspects of the simulations, I don't think it'll be possible to implement them as a pre-selection. We could provide an option to analyze a simulated image to check whether it seems to pass some of these cuts after the fact, but that's the only way i can think to do this. And once the sim is already made, this seems less useful. Thoughts / objections?

Options for exclusion_level

Assuming we're going with an exclusion_level keyword to determine the level of selection criteria to impose, here are my thoughts on the options:

I admit the line between "marginal" and "bad" is a bit blurry, so I'm welcome to alternative suggestions.

Implementation

In thinking about implementation I reminded myself why I didn't do (2), (9), or (11) in the first place: they depend on additional selection information in files we distributed in GREAT3, NOT in the GREAT3 real galaxy catalog or the parametric galaxy model catalogs. A current GalSim user literally does not have the information to do these cuts.

So that leads me to conclude that we need to either

(a) distribute a new file that can enable a user to do (2), (9), and (11), or

(b) include the information for (9) and (11) in the RealGalaxyCatalog, and for (2) in the parametric models catalog. (I suggest this split because the first two have to do with the real images and the latter has to do with the fit.)

Initially (b) appealed to me the most, because once we start requiring tons of different files for different purposes, it seems really unwieldy. We could include the information in new versions of the files on the GREAT3 server. But then it does mean that anyone who already has those catalogs won't be able to use these cuts. We'd need to detect that issue and raise an exception telling them to download new versions, which is not very nice. So then I thought okay, we have to do (a). We'll still have to tell people they need to download something new (which could be added to the tar balls as well, so people who do a new download of the COSMOS catalog get them automatically) but at least the old functionality with the old catalogs is preserved.

I'm not loving the conclusions I'm coming to here, as the whole COSMOS catalog class usage seems to be getting more and more unwieldy. Alternate ideas would be welcome, or perhaps you could just persuade me to dislike these solutions a bit less. :)

rmjarvis commented 8 years ago

Possible implementations for the tough ones:

  1. Do this after stamp creation, but before noise is added. We can check the total flux on the stamp and make sure it is close enough to the nominal flux. If not, we reject the object and draw a new one from the COSMOSCatalog.
  2. There is already a min_hlr flag in COSMOSCatalog. Can we determine approximately what this should be for each great3 branch? Or is this just not really a good match to how the resolution was computed?
  3. I think for most sims, the min_flux will be appropriate to use for this. It seems like you would typically want this to be consistent across a range of PSFs, rather than having the min flux vary when the PSF changes. However, we could also do the same trick as for 3, where we calculate the S/N of the drawn stamp and let the user impose a minimum value, rejecting objects that have a lower S/N.
  4. I don't quite follow this one. Is it that the ellipticity estimate can fail, so you remove those objects? Or if the ellipticity is too large, you reject it?
  5. We already raise an exception if the whitening adds more noise than requested for the final noise variance. So we could fold that into the "select another option" style I'm advocating for 3 and 5.
rmjarvis commented 8 years ago

As for the how to distribute extra information question, I think it would be fine to add more files to the tarball that we can use for this. We could even have the galsim_download_cosmos script check for these new ones separately and only download them if they already have the current files.

beckermr commented 8 years ago

For 3) above, would it not be the case that in most applications that mimic real data, one would pick a stamp size that contains most of the object's flux instead of cutting objects that are too extended for a given stamp? Output formats like MEDS would be able to support the variable stamp sizes easily and it might be useful to have this option.

rmjarvis commented 8 years ago

Certainly there are different use cases. We should enable both.

  1. Reject objects with too much flux outside of the given stamp size
  2. Choose a stamp size appropriate for each object.

Or in fact a third option

  1. Choose an appropriate stamp size up to some maximum and then reject objects with too much flux outside of that size.

The last is what we did for the end to end sims. I rejected objects that required stamp sizes larger than 128x128 IIRC. Not too many that big, and if you try to render them, they dominate the running time.

beckermr commented 8 years ago

Yes this sounds great. The last detail is to choose FFT-friendly sizes, like powers of 2 with a few small prime factors like 3 or 5.

beckermr commented 8 years ago

I just realized this is probably already in the MEDS code in galsim.

rmjarvis commented 8 years ago

If you let GalSim pick a stamp size for you, it will choose 2^n or 3x2^n. But currently the galsim meds option requires a designated stamp size, so there would still be a bit of work to enable the variable stamp size based on the object.

The current automatic stamp size choice is rather conservative, erring on the side of larger stamp sizes than are actually necessary. And the target flux fraction is 99.5% (tunable using GSParams folding_threshold). So we should try to make this a bit more flexible to easily allow for other choices of flux_frac.

rmjarvis commented 8 years ago

@rmandelb I think a good project for you on this branch would be to propose a version of cgc.yaml that would include all of the cuts that Great3 did. I'm happy to work on implementing the back-end code that will make it all work.

You can use the new top-level stamp field if you want for some of them. e.g.

stamp:
    min_flux_frac : 0.99

might be an appropriate place to put the minimum flux fraction falling in each stamp.

rmandelb commented 8 years ago

OK, lots of discussion on here. Let me try to make a single summary response:

Three: This was mainly an issue in GREAT3 because we had decided a priori on postage stamp sizes. I like the "you decide on min_flux_frac and it chooses a stamp size, but rejects those that would require one that is too large" idea. Do I understand correctly that you're willing to try to implement that?

Four: Resolution was calculated based on apparent size with respect to the PSF, which depends primarily on half-light radius and sersic index (and very weakly but non-negligibly on ellipticity, which is of course not good). In principle, I could try to see if the resolution cut corresponds to a simple cut in the HLR vs. sersic n plane, and implement the cut in GalSim based on the HLR and n values. Let me play around with this.

Five: I like your S/N idea. We might want to give some thought to S/N estimators though (after our realization midway through GREAT3 that realistic estimators deviate a lot from the ideal).

Eight: That was for ellipticity estimation failures only (e.g., non-convergence of adaptive moments) and affected a tiny fraction of objects. I think if we ignore it in GalSim, it's not a big deal.

Ten: sounds good.

@rmjarvis , to address your comment:

As for the how to distribute extra information question, I think it would be fine to add more files to the tarball that we can use for this. We could even have the galsim_download_cosmos script check for these new ones separately and only download them if they already have the current files.

OK, but it sounds like we might be converging towards a solution that doesn't require it (if we can do (4) using HLR and n).

And your suggestion:

I think a good project for you on this branch would be to propose a version of cgc.yaml that would include all of the cuts that Great3 did.

Sure, I am up for that.

rmjarvis commented 8 years ago

Looks like you were bitten by the Mardown start-numbering-at-1 bug "feature". :)

Do I understand correctly that you're willing to try to implement that?

Yes. I think it shouldn't be too hard. We've already got the basics in there with folding_threshold in GSParams and the automatic stamp size based on that. And the skeleton of the code to reject objects that fail some criteria is already there with the retry_failures stuff. I'm thinking that the main thing will be to make retrying failures the default with some largish number of retries (like 20 or 30 just to prevent infinite loops). Then things like having the automatic stamp size be larger than the maximum would just be a kind of failure that would trigger a retry.

Resolution was calculated based on apparent size with respect to the PSF

By size, I guess you don't mean hlr. Was this a moment-based size? T=Ixx+Iyy maybe? If so, we can do this the same way by letting the user set a minimum T to allow. Helpful if we could also tell COSMOSCatalog a min_hlr on construction to limit the number of retries required. i.e. find some hlr below which everything would be a resolution failure, so we can cut them out from the start and not have to build the stamp and check if it's ok. Likewise with the S/N -- helpful if we could set some min_flux below which everything would have too low a measured S/N.

OK, but it sounds like we might be converging towards a solution that doesn't require it (if we can do (4) using HLR and n).

I thought the extra info was for (2),(9),(11). Not (4). Am I missing something?

rmandelb commented 8 years ago

Looks like you were bitten by the Mardown start-numbering-at-1 bug "feature". :)

Curses! I fixed it by editing it so it's words instead of numbers. Sigh.

Yes. I think it shouldn't be too hard. We've already got the basics in there with folding_threshold in GSParams and the automatic stamp size based on that.

Yes, that's what I was thinking.

By size, I guess you don't mean hlr. Was this a moment-based size? T=Ixx+Iyy maybe?

Yes, compared to the same trace for the PSF image. That's the tricky bit - you can't calculate it from the image postage stamp alone.

So for GREAT3 I tabulated these values for the galaxies for a single space-based PSF and 5 Kolmogorov FWHM values, then interpolated between them. For GalSim I was thinking of trying to make a fitting function for those resolution values as a function of the intrinsic HLR, Sersic n, and PSF size. Unless you have other ideas that get around the fact that a PSF image is needed?

Helpful if we could also tell COSMOSCatalog a min_hlr on construction to limit the number of retries required. i.e. find some hlr below which everything would be a resolution failure, so we can cut them out from the start and not have to build the stamp and check if it's ok.

Well if people want to have something like a min_resolution cut, couldn't GalSim use the fitting function for the resolutions to decide which things to cut in advance?

rmjarvis commented 8 years ago

Unless you have other ideas that get around the fact that a PSF image is needed?

Issue #308 comes to mind for this.

Well if people want to have something like a min_resolution cut, couldn't GalSim use the fitting function for the resolutions to decide which things to cut in advance?

I'm not sure how useful that fitting function would be in the fully general case. We want to allow all sorts of PSF and Galaxy profiles. Not just Sersics. Better to define what the resolution cut means and then implement that as a potential stamp failure I think. In which case, it would be helpful for the specific case of the Great3 scripts to also have a min_hlr specification to cut out most of the objects that would fail before the stamp creation stage.

rmandelb commented 8 years ago

Issue #308 comes to mind for this.

So are you thinking of using a resolution defined in terms of pre-seeing light profile? (which I think you were using then) Or something that mimics the GREAT3 resolution, which is based on the post-seeing apparent size, and would involve defining the function such that it draws whatever images it needs to get the resolution if it can't compute it analytically? I'm not sure how GREAT3-like you are trying to get.

rmjarvis commented 8 years ago

I just mean the ability to calculate a hlr or Irr for any arbitrary profile. Then we can have a post-seeing resolution test as one of these stamp failure tests to mimic what Great3 did.

I'd like to enable doing something as close as possible to what Great3 actually did, since this is one real-life use case. Obviously, not everyone will want to make the same choices, but it seems like we should at least enable these simulations in a fairly straightforward way.

rmandelb commented 8 years ago

OK. If we want to do this in generic a way as possible, then I think #308 is the way to go.

Are you willing to / interested in implementing #308, and I will (a) put together a routine that uses that functionality to construct a resolution factor, and (b) work on the yaml file?

rmandelb commented 8 years ago

To summarize my current plans based on what I originally proposed and the iteration that has happened since then, and track progress (note, updated based on Mike's response):

Please let me know if you think there is something I'm missing or gotten wrong.

Here is what I think Mike volunteered to do on this branch:

I'm unable to complete this during this week due to family plans, but next week things will still be pretty quiet around here, and I plan to finish then.

rmjarvis commented 8 years ago

Sounds great! I can also do the modifications to galsim_download_cosmos if you want, since I wrote that in the first place.

rmandelb commented 8 years ago

Okay, glad we agree.

I can also do the modifications to galsim_download_cosmos if you want

Sold! I'll edit the comment and put that on your list. :)

rmandelb commented 8 years ago

@rmjarvis - I wanted to give you the info you need to make the modifications to galsim_download_cosmos. Thanks to Joe (who helped get these publicly available on the GREAT3 server), there are new files up on the GREAT3 server:

Please let me know if there's anything else you would need to do

rmandelb commented 8 years ago

@rmjarvis and @beckermr , you are welcome to try out the code for adding more selection criteria on the COSMOSCatalog class to match some of the GREAT3 cuts. It is included as of this commit: https://github.com/GalSim-developers/GalSim/commit/d20593ee71ecad6abf121d4fff58d93238ca5d28 . Suggestions for changes to make are welcome.

rmandelb commented 8 years ago

Finally: I want to work on the modifications to cgc.yaml. I believe that the two things that need to be done there are

(a) make sure that the COSMOSCatalog has the right exclusion_level (done here: https://github.com/GalSim-developers/GalSim/commit/f028d825af94d876cec6a1b8fc43c1baed672d94 )

(b) put in something that corresponds to cuts 3=flux fraction cut, 4=resolution cut, 5=approximate S/N cut. (As I think we agreed, the noise variance after whitening issue -- number 10 on the list -- is handled by the "choose another object" code in config; and the "has a valid ellipticity estimate for B-mode shape noise cancellation" -- number 8 -- eliminated very few objects, so I think we should just ditch this cut.)

For (3), I like what you suggested in https://github.com/GalSim-developers/GalSim/issues/693#issuecomment-167373178 but I am not sure that I did this right: see https://github.com/GalSim-developers/GalSim/commit/ad416743ffc77a3d2a5dede1aec58440ab41e9f5 (I couldn't find any examples that used the new top-level stamp field, so I basically plonked it in there at random)

For (4), how were you thinking of implementing this? For GREAT3, we had a very specific trace-based resolution definition. But I can imagine people using the trace or determinant radius or area, either for pre-seeing or post-seeing radii. In light of all the possible combinations, I was wondering if the resolution cuts could/should be implemented using eval to define the exact expression? I didn't do anything for this yet, since I am curious what you were thinking of doing.

For (5), there is a question of how to define the S/N. One could use an idealized S/N estimator (which is overly optimistic by a factor of ~2), a S/N estimator that is based on the flux in an elliptical Gaussian (also overly optimistic, but not as bad, and it's more robust than the former in the case of input images that were noisy to begin with). For GREAT3 we made one particular choice, but I think it would be nice to enable multiple options. What do you think?

rmjarvis commented 8 years ago

For 4 and 5, it might be most appropriate to just define a custom type that would do the calculation that Great3 did. Then people could see how to modify this to do something different, but we wouldn't be weighted down by lots of complicated options in the config definitions. The config processing could just look for a stamp.reject field and parse it to see if the stamp should be rejected. So:

modules:
   - great3_reject  # We would define the code to decide whether to reject a stamp in great3_reject.py

...

stamp:
    reject:  # Expecting a bool value
        type: Great3Reject   # A custom type that would do the appropriate checks and return whether the drawn stamp should be rejected.
rmandelb commented 8 years ago

OK... so I'd just define the Great3Reject type in great3_reject.py in examples/great3/ ? And the idea is that this class would be initialized with all the basic numbers that define the cuts, and then when it's called it would take a stamp as an input and return a boolean value?

The resolution cut would also need to know what the PSF looks like. How would that work?

rmjarvis commented 8 years ago

Rachel, @esheldon pointed out to me that the exclude_fail check, now bad_fits, removes galaxies with a bad bulgefit status as well as a bad sersicfit status. But when we make a parametric galaxy, I think we start with B+D and revert to the Sersic if B+D was bad.

So do we want to keep this check in the constructor? Are objects with bad bulgefit status suspect regardless of whether we could switch to the Sersic fit? Or maybe this check should be moved to the marginal exclusion level now?

rmjarvis commented 8 years ago

OK... so I'd just define the Great3Reject type in great3_reject.py in examples/great3/ ? And the idea is that this class would be initialized with all the basic numbers that define the cuts, and then when it's called it would take a stamp as an input and return a boolean value?

You can see an example custom module that uses hsm to measure the shape of a drawn object here. I use it in a DES example script for the output truth values of the PSF shape. It's a bit gratuitous, but it's intended to serve as an example of the kind of thing we're doing here with writing custom types outside of GalSim.

Here is a strawman of the code in great3_reject.py:

def Great3Reject(config, base, value_type):
    """See if a drawn postage stamp should be rejected based on Great3 cuts.
    """
    stamp = base['current_stamp']
    gal = base['gal']['current_val']
    psf = base['psf']['current_val']

    reject = False

    # Check the flux
    flux = numpy.sum(stamp.array)
    if flux < 0.99 * gal.flux:
        reject = True

     # ... Other checks

    return reject

galsim.config.RegisterValueType('Great3Reject', Great3Reject, [ bool ])

The resolution cut would also need to know what the PSF looks like. How would that work?

I guess this could be incorporated into the above too. Unless we can know whether we want to cut it before the stamp is drawn.

rmandelb commented 8 years ago

So do we want to keep this check in the constructor? Are objects with bad bulgefit status suspect regardless of whether we could switch to the Sersic fit?

Good question.

First of all, for context, there are 56062 objects in the catalog. Of these, 550 have a bad bulge fit status, 4 have bad values for both sersicfit and bulge fit, and none have a bad sersicfit status but a good bulge fit status. Basically the bulge fit one is more likely to fail to converge at low S/N because it has more degrees of freedom.

My suggestion: let's change bad_fits it to exclude those that have a bad status for both, while marginal excludes those that have a bad status for either. Do you agree? If so, I will make that change.

rmandelb commented 8 years ago

Re: the GREAT3Reject code, thanks for the example and the straw man code. I think I can take it from here. But -

(1) I thought that the 99% of flux thing was going to be taken care of using min_flux_frac (which you had suggested could be included in the stamp field), not using this code? Or do you think once you're letting config reject objects using some user-specified routine, we should make users specify this in their own code? My take was that people might indeed want to do this in config without specifying any other unusual cuts, so it's worth having in config.

(2) And yes, we can definitely incorporate the resolution cut into the Great3Reject code. I hadn't realized it would be able to easily access the PSF image, but once it can, this is easy to do.

rmjarvis commented 8 years ago

I think that makes sense. Then bad_fits means that we can't make a parametric galaxy, and marginal means something might be screwy, but not catastrophic.

rmjarvis commented 8 years ago

I thought that the 99% of flux thing was going to be taken care of using min_flux_frac (which you had suggested could be included in the stamp field), not using this code?

Sure. I think that's one we can implement directly. So no need to include it as part of Great3Reject.

I hadn't realized it would be able to easily access the PSF image

config['psf'] is actually the profile, not the image. We don't normally render the psf image. But we can have this code render it and stash it somewhere. Then only rerender it if the psf changes. e.g.

if 'cached_psf_image` in base and psf is base['cached_psf']:
    psf_image = base['cached_psf_image']
else:
    psf_image = psf.drawImage()
    base['cached_psf_image'] = psf_image
    base['cached_psf'] = psf
esheldon commented 8 years ago

The cuts Mike introduced remove about 20% of the galaxies that make it into great3 in the rgc branch

rmjarvis commented 8 years ago

My choices of max_hlr and max_flux in the cgc.yaml file were pretty arbitrary. So, I wouldn't be at all surprised if it cuts out way too many galaxies now.

Once the work on this branch is done, we should be able to have a selection that is closer to what Great3 did, and more importantly, easy to modify to something slightly different if we want.

rmandelb commented 8 years ago

I just pushed the changes we discussed yesterday. I'm sure that I did things in a very silly way in Great3Reject; I wasn't sure what was the best way to access certain information. Can you check it out and see what you think?

rmjarvis commented 8 years ago

OK. Back to you Rachel. I haven't done galsim_download_cosmos yet. And I made it just a warning if the user doesn't have that file rather than an error. So it skips the checks that require it if it's not there.

But I think the reject stuff is ready for you to play with a bit. galsim cgc.yaml should work now. The following will give you a reasonable-looking output that will Ctrl-C break nicely:

galsim -n200 -j1 cgc.yaml image.nproc=1 output.nproc=1 output.noclobber=False

You'll see that there are a lot of rejections from the resolution cut. So worth double checking if this is implemented correctly. If it is, then we should probably try to put some kind of pre-cut using the min_hlr and/or min_flux options (which I've turned off now) so that most objects aren't being rejected.

Also, we need a version of real_galaxy_galsim_selection.fits.gz for the unit tests. We don't want to force people to download the full version just to run unit tests, so we should have a smaller version appropriate to use with real_galaxy_catalog_example.fits in GalSim/examples/data.

rmandelb commented 8 years ago

I don't have time to do this at the moment (will try tomorrow) but perhaps I can address the rejection issue anyway. First of all:

You'll see that there are a lot of rejections from the resolution cut.

Is it the resolution cut specifically, or is it both cuts in Great3Reject? (there's an S/N cut too)

I happen to have some notes on what fraction of the parent sample gets used as a function of PSF FWHM in CGC in GREAT3: for FWHM of ~0.6, ~0.7, and ~0.8, we use 17k, 13k, and 10k galaxies out of the original 56k. Is this rejection rate inconsistent with what you are seeing with this script?

If it seems inconsistent, then I'll dig into the script to see what's going wrong. If it's consistent, then I will try to come up with some cut we can use to pre-select objects.

rmjarvis commented 8 years ago

No, that could be right. Most galaxies are rejected several times before getting something that works. So if we are expecting only a ~20% success rate, that's probably fine.

Did you happen to record the hlr and flux of the ones that you did use? If so, we can precut on the min/max of these and then hopefully there will be fewer failures after drawing the stamps, so less inefficient.

rmandelb commented 8 years ago

OK. For the record, the S/N cut is pretty important for this as well, not just the resolution cut.

I have a test version of the great3 sim script that I run when I wanted to quickly debug things. I just checked and was able to reproduce the numbers I gave above. I could hack it to print out the kind of information you are requesting. Will try to do it in the next few days (sorry, a bit hectic trying to finish a few things before leaving for Taiwan on Sunday, but it might be a good thing to do on the plane...)

rmjarvis commented 8 years ago

Also, we need a version of real_galaxy_galsim_selection.fits.gz for the unit tests. We don't want to force people to download the full version just to run unit tests, so we should have a smaller version appropriate to use with real_galaxy_catalog_example.fits in GalSim/examples/data.

Since the full tarball has an unzipped version of this file, ok if we just use that? I can have galsim_download_cosmos unzip this file after downloading when if puts it is the share directory. Then we can remove the double check in scene.py for the .fits.gz and then .fits. Probably easier to just have the single check for real_galaxy_galsim_selection.fits. Sound ok?

rmjarvis commented 8 years ago

Rachel, I'm seeing some differences between the things in the tarball currently available here and the files that I currently have in my share/galsim/COSMOS_23.5_training_sample directory. Aside from the new selection file that is, which I thought was going to be the only change.

So it seems like maybe you packaged up the wrong files to go along with the new selection catalog? Maybe an older version?

rmandelb commented 8 years ago

You are absolutely right. Ugh, that is the bad thing about saving the old and new version in directories with nearly the same name.

I am pretty tired now and don't want to screw up (again) so I will fix in the morning. Thanks for finding this.

rmjarvis commented 8 years ago

No rush. :)

rmandelb commented 8 years ago

OK, I just fixed it. @joezuntz and I have this complicated process that involves me making the file available to him and then he puts it on the server, so hopefully he'll post on here when it's replaced.

To address other things:

1)

Since the full tarball has an unzipped version of this file, ok if we just use that? I can have galsim_download_cosmos unzip this file after downloading when if puts it is the share directory. Then we can remove the double check in scene.py for the .fits.gz and then .fits. Probably easier to just have the single check for real_galaxy_galsim_selection.fits. Sound ok?

I'm fine with that.

2) I believe my outstanding action items are (a) make a small selection file available in examples/data/ to use for unit tests, and (b) suggest reasonable pre-selection criteria for cgc.yaml. I plan to do that while I am traveling this weekend, will keep you posted. Please let me know if there's anything that I'm missing.

rmjarvis commented 8 years ago

OK. For the record, the S/N cut is pretty important for this as well, not just the resolution cut.

Here are some stats. I let one file run to completion. So 5000 total objects tested. (10,000 drawn, but half are just 90 degree rotations of the other half, so only 5000 were tested for rejection.)

So resolution is the biggest category of rejections. But SNR > 100 is probably an important one too, since I bet these are larger objects that take more time for their FFTs.

And a total of 40771 stamps were drawn with a success rate of 5000/40771 = 12% which sounds compatible with your numbers above, since this file (the first one) happened to have a particularly large PSF fwhm = 0.97 arcsec.

I believe my outstanding action items are (a) make a small selection file available in examples/data/ to use for unit tests, and (b) suggest reasonable pre-selection criteria for cgc.yaml. I plan to do that while I am traveling this weekend, will keep you posted. Please let me know if there's anything that I'm missing.

I think that's right. The above numbers are the reason why I think we should have some kind of helpful pre-selection to reduce the number of rejections.