Efficient simulation of multiple sources in a single exposure

dkirkby commented 7 years ago

This PR is to address #6 and, in particular, to look at the feasibility of calculating fiberloss fractions for a whole focal plane on the fly.

dkirkby commented 7 years ago

I have implemented a simple benchmark in a new cmd-line script quickfiberloss that calculates fiberloss fractions (vs wavelength) for 5K targets randomly placed on the focal plane. The basic flow is:

simulator = specsim.simulator.Simulator(args.config)
for i in range(num_targets):
    fiberloss = specsim.fiberloss.calculate_fiber_acceptance_fraction(
        focal_x[i], focal_y[i], simulator.simulated['wavelength'],
        simulator.source, simulator.atmosphere, simulator.instrument)

I get the following timing results for the loop (i.e., not including the simulator start up time):

ntargets = 10: 15.3 us / target
ntargets = 100: 13.8 us / target
ntargets = 1000: 13.4 us / target
ntargets = 5000: 13.5 us / target

In other words, this is surprisingly fast, with a total time for 5K targets of ~70ms!

dkirkby commented 7 years ago

Those timing results were completely wrong because the code was still using the fiberloss tables, instead of galsim. The updated numbers look a lot more plausible:

n=10: 51 ms/target
n=100: 46 ms/target
n=1000: 47 ms/target
n=5000: 46 ms/target

The total time for 5K targets is now ~230s (3:50 minutes).

This is without any caching of things that do not change (atmospheric PSF and anti-aliased fiber aperture, for example), so the next steps are to:

profile and see if there are any easy speedups.
study tradeoff of speed vs accuracy for the num_wlen and num_pixels parameters.

dkirkby commented 7 years ago

The profile is instructive:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    11003   11.441    0.001   20.700    0.002 astropy/units/core.py:1212(filter_units)
    11000   11.206    0.001   13.754    0.001 galsim/base.py:757(drawImage)
42977468/42977467    5.847    0.000    6.900    0.000 {isinstance}
  1166300    1.847    0.000    2.598    0.000 astropy/units/core.py:1186(has_bases_in_common)
   168248    1.508    0.000   10.626    0.000 astropy/units/quantity.py:291(__array_prepare__)
  5581525    1.505    0.000    1.505    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
      250    1.346    0.005    3.970    0.016 astropy/io/ascii/core.py:683(process_lines)
     1000    1.326    0.001   64.245    0.064 specsim/fiberloss.py:18(calculate_fiber_acceptance_fraction)
   366521    1.279    0.000    3.765    0.000 astropy/units/quantity.py:567(_new_view)
2412895/2409763    1.194    0.000    1.336    0.000 {getattr}

It looks like we are spending more time in astropy.units than galsim.base...

dkirkby commented 7 years ago

Hardcode the following values:

        # Lookup the RMS blur in focal-plane microns.
        ##blur_rms = instrument.get_blur_rms(wlen, angle_r).to(u.um).value
        blur_rms = 10.0
       ...
        # Lookup the seeing FWHM in arcsecs.
        ##seeing_fwhm = atmosphere.get_seeing_fwhm(wlen).to(u.arcsec).value
        seeing_fwhm = 1.1

and repeat the profile:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    11000   10.614    0.001   12.991    0.001 galsim/base.py:757(drawImage)
  5581525    1.691    0.000    1.691    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
      250    1.493    0.006    4.409    0.018 astropy/io/ascii/core.py:683(process_lines)
  3275381    1.285    0.000    1.883    0.000 astropy/io/ascii/core.py:698(<genexpr>)
     1000    1.171    0.001   33.100    0.033 specsim/fiberloss.py:18(calculate_fiber_acceptance_fraction)
   135248    1.143    0.000    8.634    0.000 astropy/units/quantity.py:291(__array_prepare__)
        3    1.023    0.341    1.538    0.513 specsim/camera.py:68(__init__)
2397468/2397467    0.883    0.000    1.773    0.000 {isinstance}
   267521    0.876    0.000    2.518    0.000 astropy/units/quantity.py:567(_new_view)
1840895/1837763    0.863    0.000    0.983    0.000 {getattr}

This confirms that the astropy slowdown is mostly coming from those two calls.

dkirkby commented 7 years ago

Instructions for profiling a console-script entry point within ipython:

import cProfile
import specsim.quickfiberloss
cProfile.run('specsim.quickfiberloss.main(["-n", "1000"])', 'profile.out')

To examine the profile results saved in profile.out:

import pstats
p = pstats.Stats('profile.out')
p.sort_stats('time').print_stats(10)

dkirkby commented 7 years ago

After the latest commits, the profile is:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    11000   10.589    0.001   12.892    0.001 galsim/base.py:757(drawImage)
    11000   10.544    0.001   18.898    0.002 astropy/units/core.py:1212(filter_units)
 41524631    4.744    0.000    4.990    0.000 {isinstance}
  1166000    1.662    0.000    2.329    0.000 astropy/units/core.py:1186(has_bases_in_common)
    84841    0.682    0.000    0.682    0.000 {method 'reduce' of 'numpy.ufunc' objects}
1166824/1160767    0.659    0.000    1.064    0.000 astropy/units/core.py:2097(decompose)
  1166000    0.576    0.000    2.905    0.000 astropy/units/core.py:1195(has_bases_in_common_with_equiv)
     1000    0.572    0.001    1.205    0.001 specsim/fiberloss.py:35(__init__)
1075686/1020527    0.536    0.000    1.141    0.000 astropy/units/core.py:1934(decompose)
    57000    0.527    0.000    0.925    0.000 galsim/transform.py:107(__init__)

so we are still dominated by astropy units.

dkirkby commented 7 years ago

Checklist for closing this PR:

[x] Resolve travis test failure with py34 (py27 & py35 are working).
[x] Resolve travis build_docs -w test failure.
[ ] Update Simulator.simulate() to accept optional per-fiber arrays of (x,y), fiberloss and SED and return 2D output arrays. See #6 for details.
[x] Add sphinx doc page summarizing new capabilities.

dkirkby commented 7 years ago

I think I have fixed the last travis problems and I am ready to merge now. I decided to push the new "batch simulate" method into a separate PR since there is already a lot to review here. I am assigning @sbailey, but anyone else is welcome to jump in.

dkirkby commented 7 years ago

Updated readthedocs for this PR are at http://specsim.readthedocs.io/en/multiplesources/, including:

user guide to new fiberloss calculations.
description of new quickfiberloss benchmarking script.
API docs for new specsim.fiberloss module.

sbailey commented 7 years ago

Did you want to finish the unchecked punchlist item of "Update Simulator.simulate() to accept optional per-fiber arrays of (x,y), fiberloss and SED and return 2D output arrays. See #6 for details." before closing this, or do you want to merge this and then continue with simulating multiple sources at different focal plane locations in a separate branch/PR?

I don't see how to combine the fiberloss calculations with a spectral simulation here, even for a single target. Could you add a section to the Users Guide with the actual python code to simulate a spectrum of flux vs. wavelength for a r=2 arcsec disk galaxy at x=100,y=50 on the DESI focal plane, and an example of how to override/replace the default fiberloss calculation with a different fiberloss vs. wavelength?

i.e. this code looks fine to merge, though I don't yet know how to use it to accomplish the original goal of this branch / PR. I'm not sure if that is because this PR lays the groundwork but the final integration is still to be done, or whether I'm just not reading the right part of the documentation/code, or whether that connection isn't documented yet.

dkirkby commented 7 years ago

You didn't miss anything. I decided to do the final connection in a new PR since it will probably break the public API and there are a lot of changes here already.

To simulate a single fiber with this code, you just need to change instrument.fiberloss.method from table to galsim (either by editing the yaml file or programmatically). But the new API will bypass the single-fiber mode to calculate output arrays for all fibers with one call.

sbailey commented 7 years ago

If using method=galsim, how does the user specify where on the focal plane the object is?

For consideration: is it necessary to change the public specsim API to support multiple fibers simultaneously? Are there big performance benefits from reusing calculations? We could just wrap the existing single spectrum API in a multi-spectrum API on the desisim side. I think updating the specsim API could be useful, but I don't think it has to happen that way.

Documentation comment doesn't have to be a blocking factor for this PR, but it still applies: specsim provides great details on what it does, but lacks the big picture example of how to put the pieces together to do a simulation (unless using the command line quickspecsim). You may be so familiar with it that it is obvious to you, but I have to poke under the hood in quickspecsim or quickgen to figure that out.

dkirkby commented 7 years ago

I am merging now and will update the docs as part of the next PR after consultation with Stephen.

The scipy docs web server has been down for ~2 days now, so this may cause travis failures (since the docs build uses intersphinx).

desihub / specsim

Efficient simulation of multiple sources in a single exposure #53