LSSTDESC / SSim_DC1

Configuration, production, validation specifications and tools for the DC1 Data Set.
4 stars 2 forks source link

Make sure simulated catalogs cover a large enough area #23

Closed egawiser closed 7 years ago

egawiser commented 7 years ago

DC1 plans to simulate coverage of 4 contiguous LSST fields (hexagons) on the sky, with dithers allowing the central pointing to fall anywhere within one of those hexagons. A quick geometrical simulation is needed to determine the total footprint size and aspect ratio; in the SRM and related discussions, this was estimated to be 90 square degrees, but it's not a square. If this has not been done yet, it would be helpful to determine when it is needed by, what the most useful format is (do we need to specify which 4 fields in ra,dec?), and who is willing to undertake this.

humnaawan commented 7 years ago

@johnrpeterson Could you comment on whether the full PhoSim simulation can happen on user-specified CCDs (i.e., put another way, does PhoSim allow simulating partial focal planes)? If we are to request coverage simulation with a dithered strategy, we'd need to track ~10 fields as dithering will move the fields adjacent the 4 contiguous fields in/out of the specified survey area.

cwwalter commented 7 years ago

@humnaawan I think you can use the sensor command line option to do what you want:

 -s SENSOR, --sensor=SENSOR
                        sensor chip specification (e.g., all, R22_S11,
                        "R22_S11|R22_S12")
humnaawan commented 7 years ago

Hi @cwwalter, thanks! So this means that for the DC1 simulation, we can select the area that would be covered by 4 undithered contiguous FOVs and then specify the sensors that will come in/out with a dithered strategy. @egawiser thoughts?

cwwalter commented 7 years ago

I think our plan was just to specify an area larger enough to cover everything including any dithers if we specify full focal plane visits. Is there any reason not to do that?

PhoSim will be smart enough to ignore sources that are off the sensors (or include scattered light from them if they are bright enough).

egawiser commented 7 years ago

@cwwalter for catalogs we agree completely; we should specify an area large enough to cover the 4 contiguous hexagonal fields chosen be observed (plus some buffer for dithering). However, as @SimonKrughoff pointed out, if we just draw OpSim visits from 4 hexagonal fields and dither them, we'll end up with a highly nonuniform simulation (tapering strongly away from the center). Instead we should draw OpSim visits from the ~10 hexagonal fields that sometimes dither into that area and only keep the data in that area. @humnaawan's concern is how to turn off the many CCDs from those visits that fall outside that area to avoid a huge waste of CPU time for PhoSim (which we obviously couldn't afford) - it looks like that sensor command line option will do the trick - thank you! We'll need a bit of guidance in how to use that command, and then we'll need to code something in MAF that decides which CCDs to turn on for a given visit. Perhaps @yoachim can help us since he has the focal plane geometry coded up in MAF.

danielsf commented 7 years ago

@egawiser The code to convert from RA, Dec to position on the camera resides in the sims_coordUtils package, so you shouldn't need to go all the way to MAF. What do you want the user experience to look like? You give the code a pointing RA, Dec and it tells you which chip is inside the DC1 catalog area? Also: are you using a conda install or an eups distrib install of the sims stack (unfortunately, because of difficulty in getting conda versions released, the code I am envisioning will run much faster on eups distrib installed stacks than conda installed stacks).

humnaawan commented 7 years ago

Hi @danielsf, yes I was able to get sims_coordUtils package to work (after some help with @yoachim at the AAS meeting), although I haven't found a way read in OpSim data outside of MAF -- needed since we need observation pointing, expMJD and rotSkyPos to get the chips from RA, Dec -- and not have a dependency on MAF.

danielsf commented 7 years ago

sims_utils defines a class ObservationMetaData which is meant to carry around all of the data from a pointing. There is a class ObservationMetaDataGenerator which can be imported from sims.catutils.utils that connects to an Opsim sqlite database and allows you to query it for pointings in a certain region on the sky, returning ObservationMetaData corresponding to those OpSim pointings.

What exactly are you trying to do?

humnaawan commented 7 years ago

Oh ok. So the idea was that since parts of a given FOV might fall into the DC1 region, we might save computing time if we consider which chips fall into our region of interest, not the FOVs.

I didn't know about ObservationMetaDataGenerator, so I was importing ObservationMetaData and manually finding the pointingRA, Dec (which, for the dithered case, are ditheredRA, ditheredDec) and the corresponding rotSkyPos and expMJD. To do that, I am using the MAF db and MAFUtils classes to get OpSim database with my sqlconstraint. Things seem to be working but I'll have to look into ObservationMetaDataGenerator to see if it simplifies things.

danielsf commented 7 years ago

ObservationMetaDataGenerator is pretty specialized for catalog-generation, so it might not actually be what you want (i.e. it does not know about dithering). It sounds like what you are doing is the right thing to do.

humnaawan commented 7 years ago

Great, thanks!

jchiang87 commented 7 years ago

Using the astro_lsst_01_1000_sqlite.db file provided by @rhiannonlynne at https://github.com/lsst/sims_operations/pull/55#issuecomment-272330385 and the list of fieldIds provided by @humnaawan at https://github.com/lsst/sims_operations/pull/55#issuecomment-272559922, I get the following for the r-band visits:

 fieldId   # visits    total visits
   1212       204          204
   1220       202          406
   1234       203          609
   1305       203          812
   1323       202          1014
   1333       202          1216
   1365       204          1420
   1413       204          1624
   1431       204          1828
   1447       204          2032
   1464       203          2235
   1506       204          2439
   1542       202          2641
   1564       202          2843
   1568       202          3045

The third column is the running total of visits, and the final number, 3045, is about 2.5 times the original proposal of 150 visits/field * 8 fields = 1200 visits. Will we reduce these numbers for the actual production runs? If not, then any time or resource estimates based on 1200 visits will, of course, be larger by that 2.5 factor.

egawiser commented 7 years ago

Good point - it looks like 200 visits is a better estimate of the number of r-band visits than 150. But I have good news: we don't want to simulate all CCDs for all 15 of those fields. We only want to simulate those that, after dithering, cover a fixed region of 4 contiguous (hexagonal) fields. So while there will be some overhead for CCDs that only partially cover that region, this should be closer to 200 visits/field * 4 fields = 800 visits. We'll need to estimate that "overhead" but should be able to fit into the original resource allocation.

jchiang87 commented 7 years ago

Ok, the imsim runs are done for individual sensors, so just simulating the desired set of CCDs per visit is not a problem. The story may be different for the PhoSim runs.

Can someone provide the code to determine the sensors to simulate for a given visit? Also, @cwwalter suggested we start the imsim runs this week without dithering if that isn't finalized by Friday, so that you'd have a full dataset with analysis by the LSST Stack available in advance of the DESC meeting. I assume that whatever code identifies the subset of CCDs to simulate can also be used for the case without dithers.

egawiser commented 7 years ago

For the no-dither case, there would just be 4 fieldIDs for which to simulate all CCDs.

humnaawan commented 7 years ago

Hi @jchiang87 is there a reason you arent using minion_1016 to find the number of total visits? The fieldIDs I have are based on the current baseline cadence. I don't think there should be any difference since the database Lynne generated just adds additional columns, but I am just wondering if I'm missing something here.

Also, I do have the code for finding the chips that come into play. Just give me an hour or so to make the code and notebook available on GitHub.

jchiang87 commented 7 years ago

@danielsf pointed me at Lynne's db file, since it included the dithers, so I understood that to be the one to use. FWIW, for minion_1016, I get similar results

 fieldId   # visits    total visits
   1212       202          202
   1220       203          405
   1234       202          607
   1305       203          810
   1323       202          1012
   1333       201          1213
   1365       202          1415
   1413       202          1617
   1431       202          1819
   1447       202          2021
   1464       202          2223
   1506       202          2425
   1542       203          2628
   1564       202          2830
   1568       202          3032

The visit numbers corresponding to those fieldIds, however, do differ between minion_1016 and Lynne's file, so I'm a little concerned that the two files are not interchangeable in that regard. I haven't had time to check....

I guess for the non-dither case, minion_1016 should be used, whereas for the dithered case Lynne's file (or its successor) should be used.

rhiannonlynne commented 7 years ago

Just to be clear, minion_1016 and astro_lsst_01_1000 are similar but will not be the same regarding number of visits, etc. I do not have the mysql database for minion_1016, so I couldn't run @humnaawan's script on that run. The fieldIDs should be the same for fields pointed at the same location in the sky. Now that the script has been debugged and it runs, perhaps @mareuter could run it on minion_1016?

humnaawan commented 7 years ago

@rhiannonlynne could we wait by the end of today for a re-run? I just have to check a few more things with the astro_lsst_01_1000 file.

mareuter commented 7 years ago

@rhiannonlynne @humnaawan I am setup to run on minion_1016, but I will wait for the go-ahead to do so. If the repository that I need to run needs to be updated, please let me know.

cwwalter commented 7 years ago

Good point - it looks like 200 visits is a better estimate of the number of r-band visits than 150.

@danielsf @rhiannonlynne and I were just at the project simulation weekly meeting. I asked OpSim expert (Kem) why the visit number was > 200.

His response is that they modified the OpSim run to be very conservative and return more visits than were actually needed to meet SRD requirements. Visits were stopping on some fields after they reached the requested limits as the survey continues and Zeljko requested that they make a change. It sounds like roughly the criteria is "every field must have the minimum number of visits as would be inferred by SRD requirements, but all fields must reach the requirement."

If we want to limit the number of visits, Kem advises we should just take the first X that we want. Do we have an independent calculation of the number of visits needed?

(added: I can't remember what source we used 150 from)

humnaawan commented 7 years ago

@jchiang87 sorry for the delay but all the code is now available here. findDC1Chips.py contains the code for finding the chips, but it requires inputs that are produced by findDC1Regions.py

While you can modify the code for your purposes, I ran it for the region I referenced to before. In the iPython notebook in the repo, you can see the output of findDC1Chips in output[6]. Are these input-able to imSim?

I still have to update the READMe with a summary of what I am doing, but let me know if there are any concerns.

egawiser commented 7 years ago

@cwwalter I just checked, and the v2 Science Book notes 230 visits in r leading to a co-added point source depth of 27.7 (versus the science requirement of 27.5). I know that the predicted depths have dropped a bit recently due to more realistic modelling of sky brightness and point source detection limits. It's possible that 150 visits per band just came from taking 1000 total visits and dividing by 6 filters, but we should check the latest version of the SRD.
Added: v4 of the Ivezic et al. overview paper notes 184 visits in r leading to a co-added point source depth of 27.5, citing the SRD as of 2014, so that's significantly more recent.

tonytyson commented 7 years ago

correct

On 1/18/17 2:28 PM, Eric Gawiser wrote:

@cwwalter https://github.com/cwwalter I just checked, and the v2 Science Book notes 230 visits in r leading to a co-added point source depth of 27.7 (versus the science requirement of 27.5). I know that the predicted depths have dropped a bit recently due to more realistic modelling of sky brightness and point source detection limits. It's possible that 150 visits per band just came from taking 1000 total visits and dividing by 6 filters, but we should check the latest version of the SRD.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/23#issuecomment-273622164, or mute the thread https://github.com/notifications/unsubscribe-auth/ACA6Yt-q5s4auE8k429YbI1uYuMrdZRIks5rTpH9gaJpZM4KZ46l.

danielsf commented 7 years ago

@humnaawan @jchiang87 FYI: chipNameFromRaDec() (which findDC1Chips.py uses) can be a little slow on the LSST camera. It is as general as possible and allows for telescopes with prisms that allow the same (RA, Dec) pair to fall on more than one chip. This means that, even after it finds a match, it keeps searching, just in case there is another match. chipNameFromRaDecLSST() is optimized for the LSST camera and is about a factor of 6 or 7 faster. Unfortunately, chipnameFromRaDecLSST() is new enough that it is only available from the eups distrib installed stack versions. I just wanted you to be aware of this.

humnaawan commented 7 years ago

@danielsf, thanks! chipNameFromRaDec() isn't prohibitively slow (thankfully) -- the run took ~3hrs on my machine -- but it definitely helps to know about chipNameFromRaDecLSST().

jchiang87 commented 7 years ago

Hi @humnaawan

Thanks for the pointer to your code. I'm afraid it's not clear to me how to use it. Perhaps I'm not understanding what we want to do, so let me try to summarize:

There is a ~40 deg^2 region on the sky for which we want to simulate LSST observations at the 10 year depth. That region is nominally covered by 4 hexagons as shown in your diagram at https://github.com/lsst/sims_operations/pull/55#issuecomment-272554076

We have also identified ~3000 visits (15 fields) in minion_1016 that would overlap with our target region once dithering is included. For any given visit, some subset of the chips in the focal plane overlap with the target region; and to save simulation time, we want to run imSim only for those chips.

So for a given visit, I think need a function that returns that subset of chips, i.e.,

def chips_to_simulate(ditheredRA, ditheredDec, target_region):
    ...
    return list_of_chips

Would this function be easy to implement using your code?

I'm thinking it's probably sufficient to model the target region as 4 circles on the sky (the green circles in your diagram), and the region covered by each chip as the smallest circle that encloses the chip. Using lsst.sims.coordUtils.raDecFromPixelCoords to get the coordinates of the center of each chip, one can then decide to include a chip in the simulation just based on the angular separation of the chip center to the centers of each of the four target region circles. I expect this to be ~10% too inclusive, which wouldn't be that big a deal in added overhead for the simulations. If we want (or need) to be more parsimonious, we can do something special for the edge cases, perhaps similar to what you are doing in findDC1Chips.py, but just looping over pixels defining the edge of the target region.

humnaawan commented 7 years ago

Hi @jchiang87 your description of our goal is right: we want to find all the chips that come into our target region at any point during the 10yr survey.

findDC1Chips does return the list of chips that overlap with the region. I havent used lsst.sims.coordUtils.raDecFromPixelCoords but the way findDC1Chips is set up is that it takes the position of a given HEALPix pixel (in terms of RA, Dec; given by pixRA, pixDec) and checks which chip has observed it. Note that for this to happen, you need to setup an ObservationMetaData object which requires pointingRA, pointingDec, rotSkyPos and expMJD to account for the telescope position, so the process isn't as simple as having ditheredRA, Dec and the region of your interest. You can read more about chipNameFromRaDec here and also see the test notebook that demonstrates a way of it using it.

I can try to see if I can create a higher level function analog of findDC1Chips later today that takes care of finding all the necessary things aside from user-specified dither strategy and target region. For now though, the ipython notebook in the repo contains the chips that fall into to the target region that we currently have.

cwwalter commented 7 years ago

@humnaawan Independent of the sensor discussion:

could we wait by the end of today for a re-run? I just have to check a few more things with the astro_lsst_01_1000 file.

Did this happen?

cwwalter commented 7 years ago

For now though, the ipython notebook in the repo contains the chips that fall into to the target region that we currently have.

Is this good enough for today @jchiang87?

cwwalter commented 7 years ago

@egawiser says:

Added: v4 of the Ivezic et al. overview paper notes 184 visits in r leading to a co-added point source depth of 27.5, citing the SRD as of 2014, so that's significantly more recent.

And from the previous work by @jchiang87:

The third column is the running total of visits, and the final number, 3045, is about 2.5 times the original proposal of 150 visits/field * 8 fields = 1200 visits. Will we reduce these numbers for the actual production runs? If not, then any time or resource estimates based on 1200 visits will, of course, be larger by that 2.5 factor.

That was seemly roughly about 200 visits to to about 16 pointings. If I have all the factors correct now this 16 pointings corresponded to dithering inside of 8 fields to fully cover at depth the 4 fields in the center of the survey.

So from the discussion above it sounds like 150 visits is too few but >200 is somewhat larger than we need. But, 200 may not be unreasonable considering that we are covering less pointings now that we agree we can only use some of the sensors. To compare these numbers now we need to compare not visits but visits*chips. So our original plan was for:

150 visits/field 8 fields 189 chips = 226800 visits*chips

@humnaawan Can you tell me what the new [X fields * Y chips] number is? It's not obvious to me by just reading the notebook. We need to decide how many visits to take. I'm assuming 200 is going to be OK since we are going from 8 to 4 fields (near 8 pointings?) and reducing the number of chips but I want to make sure we have a similar number to our original estimate.

jchiang87 commented 7 years ago

@humnaawan I guess I still don't understand how I am supposed to use your code. In your notebook at Out[16], the chips listed there are all of the chips in the focal plane (including the ones from the corner raft, which we will not simulate). As I indicated in my previous comment, I need the science raft chips that are relevant for a single visit, the numbers of which I assume will range from 0 to 189. (BTW, I have used chipFromRaDec as well as a lot of the various functions in sims_coordUtils).

jchiang87 commented 7 years ago

So I'm thinking there is a much simpler way to deal with this chip selection business. If I could get the four hexagonal regions corresponding to what I've been calling our target region (e.g., simply as the four green acceptance cones on the sky), then it would be a simple matter to filter out the objects in a given instance catalog that are not in those regions. imsim will downselect the input object list on a per chip basis anyways, and so any chip with zero sources just gets omitted from the simulations for that visit.

cwwalter commented 7 years ago

@jchiang87 Are you ready to go tomorrow with no dithering if we can't get this worked out in the next hours?

jchiang87 commented 7 years ago

For no dithering, I still need to know the four fields we want to simulate. That can be as simple as 9 numbers: ra, dec of the 4 centers + the radius of the fields.

jchiang87 commented 7 years ago

actually I just need the 4 fieldIDs

cwwalter commented 7 years ago

As a follow up on visit #s. I had a side conversation with @connolly (who is at a conference). He adds this context:

There is about a 30% buffer/contingency in observing time for LSST. This means we get to 10 year depths early. OpSim was going nuts when we got close to being complete (ie 90% of the fields are done so it tries just to get the final 10%) so extending the survey meant a more uniform survey sampling at the end of the survey (edited) The depths in the overview paper are somewhat artificial as they assume a dark sky (which we wont have). I would stick with the 185 images as for DC1 that is good enough. For DC2 we could take a "real" opsim run for 10 years

So following the overview paper and Kem's advice we would like to take the 1st 185 visits for each pointing.

OK @jchiang87?

jchiang87 commented 7 years ago

So for the non-dithered case, we would have only four fields, i.e., ~800 visits if we do the full 10year depth....so no need to just take the first 185 in this case? (We can always run the final ~15 visits per field later, of course).

egawiser commented 7 years ago

It strikes me as easier to be able to tell people that we used all of minion1016 than the first 185 visits (=90%) of minion1016, and it's not a major difference in run-time or final depth. But the choice won't affect LSS science, so it's yours to make.

cwwalter commented 7 years ago

I think less a matter of time but of getting to the correct depth (as defined in the SRD) so we don't have to do extra work to not use those visits. It makes it easy to explain. Anyway, the full run isn't really 10 years because of the behavior of OpSim that was discovered when the run happened.

So, unless it is a lot more work for Jim and I would like to just use the 1st 185.

cwwalter commented 7 years ago

@humnaawan

actually I just need the 4 fieldIDs

Can you get these to Jim so he is ready to start with them if necessary? Thanks!

egawiser commented 7 years ago

Astronomical depth goes as sqrt(t) or sqrt(N) exposures, given uniform observing conditions and visit depths. The difference between 185 and ~200 is therefore 0.04 magnitudes (4%) which is completely within the uncertainties of our ability to predict depth. We would never throw out the extra 15 visits, and we have no way of knowing whether 185 or ~200 is a better simulation of the actual depth that LSST will achieve. Hence it's much simpler to just use an OpSim run directly than to modify it, but if the 10% of CPU time is important to save, you can just tell people that we simulated the first 9 out of 10 years of the survey (roughly) in order to save 10% CPU time.

egawiser commented 7 years ago

Humna and I just talked through both of @jchiang87 's recent suggestions for simplifying how to choose which CCDs to simulate. Let's call them "chip distance" and "ImSim downselect" respectively. ImSim downselect sounds like it can't fail, but unless PhoSim behaves similarly (and we suspect it doesn't), it's a stop-gap. We prefer Jim's "chip distance" suggestion from yesterday evening where each CCD is modelled as an enclosing circle and the distance from its center to each of the 4 hexagonal field centers is checked; a CCD would be included if the minimum of those 4 distances is less than (LSST FOV radius+CCD "radius").

cwwalter commented 7 years ago

I was just saying that 185 is good enough and it was tied to a number in the overview paper so I prefer to use it (as was Andy's suggestion). As a bonus it also makes things go a bit faster but that's not the main reason.

So, unless people feel strongly let's do that. I don't care that much. But I want to decide in the next 10 minutes :)

egawiser commented 7 years ago

Decisiveness is good! :) Either way will work, but my gut feeling is still to simulate all of minion1016 rather than trying to explain why we stopped it after ~9 years. btw, @humnaawan will send the 4 fieldIDs shortly... and we can get you an updated estimate for number of chip-visits by tomorrow if @jchiang87 doesn't get there first.

cwwalter commented 7 years ago

OK. Let's decisively decide to use the entire run as Eric suggests. No more discussion. (Note: @jchiang87)

I think the CPU time difference is in the noise so we will go with simplicity of explanation.

Thanks on getting the fieldIDs to Jim when you can @humnaawan !

Also, following up from before in case it go lost: did the afterburner get run again?

egawiser commented 7 years ago

For the afterburner, Humna has a query in to @rhiannonlynne about a bug fix; it was changing rotational position every visit rather than every filter change.

cwwalter commented 7 years ago

For the afterburner, Humna has a query in to @rhiannonlynne about a bug fix; it was changing rotational position every visit rather than every filter change.

Ah OK. And the original purpose of the the rerun was to change databases correct? BTW, FYI: It takes 4 to 5 hours for the people in Tucson to do that.

humnaawan commented 7 years ago

@jchiang87 here are the four fieldIDs: 1447, 1323, 1431, 1333

This is how I am seeing them: screen shot 2017-01-19 at 2 54 01 pm zoomed in: screen shot 2017-01-19 at 2 52 54 pm

I will work on getting you the chips per visit info in a bit. Sorry I didn't quite understand what you were asking before.

@cwwalter so for debugging purposes, @rhiannonlynne ran the afterburner update on a database for which she had the mysql database and gave me the output (see earlier in this thread) instead of requesting the full minion_1016 run. You can see the related discussion here. Once the bugs are fixed and we know that the dithers are working, we could request the run on minion_1016.

jchiang87 commented 7 years ago

Thanks @humnaawan ! @cwwalter I think we're good to go for tomorrow in the no-dither case.