LSSTDESC / Twinkles

10 years. 6 filters. 1 tiny patch of sky. Thousands of time-variable cosmological distance probes.
MIT License
13 stars 12 forks source link

Run 3 phoSim Workflow #315

Closed TomGlanzman closed 7 years ago

TomGlanzman commented 7 years ago

This issue is intended to start a discussion on the elapsed time that will be needed for the next Twinkles cycle based on experience with the first Twinkles cycle.

Twinkles 1 was ~1250 visits sensors and took 1-2 weeks to finally conclude, with ~10% of jobs running out the clock.

Task: Twinkles-phoSimII

1227 total jobs (visits) 1124 (91.6%) completed successfully 103 ( 8.4%) ran out of time

Mean wallclock time for successful jobs = 1400 minutes (23.3 hours) with tails going out to 8,000 minutes (133.3 hours = 5.55 days), not including the jobs that ran out of time.

*\ Assume 2500 cores max (Fermi not busy)

*\ Assume 80% of visits complete within 3000 minutes (or 50 hours = ~2 days), the remaining 20% outliers to require 7 days.

*\ Assume visits are time-ordered in terms of processing dispatch (not optimized for any other reason)

To process 25,000 visits:

80% = 20,000 jobs. That is 8 full cycles of 2 days = 16 days.

The remaining 5000 jobs will be started, on average, in 625 job chunks every two days. Therefore, the last chunk will finish about 7 days after the next to last cycle, or (16-2)+7 = 21 days

Temporary work storage needed for 2500 running jobs is ~7 TB (~2.3 GB/running job) Permanent output storage is 4.8 TB (~200 MB/sensor/visit)

This is probably the most optimistic estimate imaginable: no start-up hiccups, smooth running, no extra-long outliers, no significant failure rate, no I/O overload requiring job throttling, etc.

More realistic: 1) trickling jobs to avoid a shock front of I/O load 2) start-up delays (validation, problem investigation, etc.) and pauses during the run 3) limiting the total simultaneous number of jobs 4) significant stragglers at the end due to rollbacks 5) competition of CPU resources from Fermi 6) lack of proper Pipeline support for checkpoints will cause confusion

Optimizations: 1) Move all phosim Input from wain032/25 to Lustre (Fermi scratch) 2) Tune jobs to ask for resources they will need (mem/swap) (which will likely reduce the number of simultaneously running jobs) 3) Use Seth's algorithm to better estimate expected number of checkpoints (to reduce startup I/O load)

Update

Here are all the issues (based on a list made by @TomGlanzman) that need to be resolved to get the Run 3 phoSim workflow operational, and hence form part of this Epic:

To check on these issues, click on the "Epic: Run 3 phoSim Workflow" label.

We also have the following separate but related Epics and sub-issues, that all have the same milestone (#10):

drphilmarshall commented 7 years ago

Thanks Tom!

Are we covered for storage? These are small but perhaps not insignificant numbers.

I suggest naming tasks to match closely the phase of the project we are in. We are doing the DC1 key project "Twinkles 1" whose data generation era is coming to a close. During R&D we carried out several phosim simulation and/or DM processing "runs" (Run 1, Run 1.1, Run 2). We are now talking about "Run 3", as the full production run. Good pipeline task names could therefore be:

Twinkles1-phoSim-run3 Twinkles1-DMLevel2-run3

On Thu, Sep 15, 2016 at 3:44 PM, Tom Glanzman notifications@github.com wrote:

This issue is intended to start a discussion on the elapsed time that will be needed for the next Twinkles cycle based on experience with the first Twinkles cycle.

Twinkles 1 was ~1250 visits sensors and took 1-2 weeks to finally conclude, with ~10% of jobs running out the clock.

Task: Twinkles-phoSimII

1227 total jobs (visits) 1124 (91.6%) completed successfully 103 ( 8.4%) ran out of time

Mean wallclock time for successful jobs = 1400 minutes (23.3 hours) with tails going out to 8,000 minutes (133.3 hours = 5.55 days), not including the jobs that ran out of time.

*\ Assume 2500 cores max (Fermi not busy)

*\ Assume 80% of visits complete within 3000 minutes (or 50 hours = ~2 days), the remaining 20% outliers to require 7 days.

*\ Assume visits are time-ordered in terms of processing dispatch (not optimized for any other reason)

To process 25,000 visits:

80% = 20,000 jobs. That is 8 full cycles of 2 days = 16 days.

The remaining 5000 jobs will be started, on average, in 625 job chunks every two days. Therefore, the last chunk will finish about 7 days after the next to last cycle, or (16-2)+7 = 21 days

Temporary work storage needed for 2500 running jobs is ~7 TB Permanent output storage is 122 GB

This is probably the most optimistic estimate imaginable: no start-up hiccups, smooth running, no extra-long outliers, no significant failure rate, no I/Ooverload requiring job throttling, etc.

More realistic: 1) trickling jobs to avoid a shock front of I/O load 2) start-up delays (validation, problem investigation, etc.) and pauses during the run 3) limiting the total simultaneous number of jobs 4) significant stragglers at the end due to rollbacks 5) competition of CPU resources from Fermi 6) lack of proper Pipeline support for checkpoints will cause confusion

Optimizations: 1) Move all phosim Input from wain032/25 to Lustre (Fermi scratch) 2) Tune jobs to ask for resources they will need (mem/swap) (which will likely reduce the number of simultaneously running jobs) 3) Use Seth's algorithm to better estimate expected number of checkpoints (to reduce startup I/O load)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/315, or mute the thread https://github.com/notifications/unsubscribe-auth/AArY98O_1By2eAaU844zou8F0Iwbr3Opks5qqcpegaJpZM4J-buj .

TomGlanzman commented 7 years ago

I've corrected some arithmetic and added some detail in the original estimate pertaining to storage.

Phil, yes, we should be covered for both permanend and temporary (higher performance) storage.

Task names are pretty arbitrary and can take on a life of their own if there are mid-course corrections, etc. Yours (and mine) are rather long, in retrospect, given the number of times it must be manually typed in for monitoring the tasks progress.

drphilmarshall commented 7 years ago

Fair enough, I was just trying to stamp out any remaining Run 1 / Twinkles 1 confusion :-)

Are there to-do items arising from your discussion above that now need issuing? If so, I suggest we turn this issue into an Epic, and label all the spin-off issues as "Epic: Run 3 PhoSim Workflow" and attach them all to the "Twinkles 1 (Run 3) PhoSim Data generation Start" milestone. Do you want to have a go at that? It's really your to-do list, but you'll be helping us keep up with you as you go.

sethdigel commented 7 years ago

Not sure whether this belongs here, but I've started a Confluence page looking at the influence of bright stars on the CPU times for the Twinkles runs, based on Tom's current PhoSim-deep-pre3 runs, and phosim trimcat (i.e., input source list) files that Jim generated for the Opsim visit Tom is running (1668469). The giant instance catalog for the FOV gets divided up into 189 trimcat files, one per CCD. Evaluating total CPU times is tricky for these jobs with checkpointing (Tom can explain). So far I've looked only at the stellar content of the phosim jobs that have been running for weeks. Formerly such long times were not possible on the batch farm, which has a 120-CPU hour limit, but the checkpointing has effectively enabled it.

The bottom line so far is that the brightest few stars (two have magnitudes <7 in the instance catalog) have a huge influence on the CPU time requirements, not just at their locations but over a fairly large area.

Once we've worked out CPU time tallies, then we could consider out what magnitude limits might need to be imposed in the instance catalogs for the next Twinkles run.

sethdigel commented 7 years ago

This page in Confluence describes the relation between CPU time and the magnitudes of the brightest stars in Tom's PhoSim-deep-pre2 simulation of the Opsim 1668469 pointing. (For this run, no checkpointing was used and the interpretation of the pipeline logs is much more straightforward.)

The most important plot is this one, showing the relation between CPU time and the magnitude (as listed in the instance catalog) of the brightest star, for each of the CCDs in the focal plane. The two brightest stars influence many CCDs. cpu_star_1st_1668469 Note that the individual phosim runs were made on a variety of host classes in the batch farm; I have not scaled them to a common host type.

For this particular pointing (which does not have a particularly bright sky background), the brightest star starts to significantly influence the CPU time required by about magnitude 10.

For the 189 phosim runs to simulate this single pointing, about 367 CPU days were required (and 38 of the runs died at the 120-hour CPU limit and did not produce any output). If stars brighter than magnitude 8 had been excluded, the total time would have dropped to about 170 days, with no job hitting the 120-hour limit. If it had been limited to 10, the CPU time would have been about 80 days total, which is approximately the point of diminishing returns. (For other pointings with brighter sky backgrounds, a brighter magnitude limit could be used without influencing the CPU time requirements very much, but of course that would be because the bright sky background already makes the required time very long.)

rbiswas4 commented 7 years ago

In previous Twinkles runs, we had an explicit constraint on the brightest star allowed in terms of g magnitude. And all stars with g magnitude in ab < 11 were dropped. Same for galaxies.

Is there a change to this plan?

sethdigel commented 7 years ago

Interesting. I had assumed that we just got lucky with the particular CCD-sized field simulated for Twinkles Run 1. Where is the limit documented? I guess it was applied at the CatSim level?

jchiang87 commented 7 years ago

@rbiswas4 Even though this is being posted at a Twinkles issue, it's a more general context in which we want to quantify the runtime performance: for PhoSim Deep, et al., we'll need to apply a magnitude cut to limit the cpu time so we need to understand the behavior of runtime as a function of the brightest objects in the instance catalogs in order to assess any tradeoffs incurred by applying a magnitude limit.

rbiswas4 commented 7 years ago

@sethdigel It was in the generatePhosimInput.py script: https://github.com/DarkEnergyScienceCollaboration/Twinkles/blob/master/python/desc/twinkles/generatePhosimInput.py#L105 and if this is commented out https://github.com/DarkEnergyScienceCollaboration/Twinkles/blob/master/python/desc/twinkles/generatePhosimInput.py#L110, we would get the constraint on galaxies.

@jchiang87 Sure. I just wanted to be sure that we are not changing the plan for Twinkles (or we could opt to throw the switch there).

drphilmarshall commented 7 years ago

Sounds like a serious question for the DC1 PhoSim Deep science team is: can you still write the papers you want to write if the survey contains no stars brighter than 11th magnitude?

sethdigel commented 7 years ago

Thank you, @rbiswas4. I am losing track of the distinctions between the various Twinkles runs, PhoSim Deep, DC1, etc.

I've updated the Confluence page with an overlay of the positions of the brightest stars in the focal plane together with the phosim run times by CCD for the Opsim ID 1668469 pointing.

show_cpu_fp_1668469

The crosses are the positions of the centers of the CCDs; the areas of the red circles indicate the CPU times (with black diamonds marking runs that were killed at the 120 CPU hour limit), and the green symbols showing the stars with magnitudes <=10. Interestingly, the runs that reach the limit are in what must be the tails of the PSFs of the brightest stars. Naively I would have expected that these runs would have a 'rounder' distribution around the locations of these stars, though. I'd guess that this has to do with how phosim decides which trimcat files a bright star ends up in.

SimonKrughoff commented 7 years ago

@sethdigel This is an excellent plot. Thank you so much for making it.

I find it interesting that the very brightest stars are next to chips that have shortish runtimes and then one space away the simulation fails. I wonder what is happening with that.

rbiswas4 commented 7 years ago

@sethdigel That is a cool plot!

What are you using to plot the positions of the chips and stars and does it include the various warping/perturbation terms in positions?

sethdigel commented 7 years ago

Thank you, @rbiswas4. The positions of the chips are as phosim reports in the 'Instantaneous Instrument and Site Characteristics' section of its output, but they originate in the focalplanelayout.txt file. The positions of the stars are derived from their instance catalog coordinates and the RA, Dec, and rotation angle of the OpSim pointing, using the same calculation as phosim makes in trim.cpp for making the trimcat catalogs. The positions do not include any warping/perturbation terms.

drphilmarshall commented 7 years ago

I am now intrigued to see the images that finished from the 2x2 set of rafts in the bottom left hand corner. There could be some interesting artifacts in those rafts' CCD images that give some clues as to why the simulation time is so long. A simple zscaled ds9 view is where I would start, with output images saved as JPGs and then arranged by hand in a PPT slide as a low budget mosaic. If only we had a full-field image viewer SUIT notebook widget! (Use case alert, @gpdf)

SimonKrughoff commented 7 years ago

@sethdigel are these images available on a machine I have access to (I have a SLAC account).

sethdigel commented 7 years ago

The output e-images are in subdirectories of /nfs/farm/g/desc/u1/Pipeline-tasks/PhoSim-deep-pre2/phosim_output/000000 The organization seems to be R##_S##/output/lsst_e_1668469_f2_R##_S##_E000.fits.gz. The output/ directories also contain files with images for the individual segments. I should add that for runs that reached the 120-hour CPU limit (black diamonds in the plot above) the output directories are empty.

sethdigel commented 7 years ago

Sorry, another thought, @SimonKrughoff : To reiterate, 38 of the 189 phosim runs hit the CPU time limit. You might want to look at the output e-images for PhoSim-deep-pre3, for which Tom enabled checkpointing in phosim. Each phosim run had 8 checkpoints. The CPU times tended not to be divided up very evenly among the checkpoints, and it was not uncommon that the last checkpoint reached the CPU time limit, but overall PhoSim-deep-pre3 did get further, with only 29 incomplete or timed out runs at present. The images are in subdirectories of /nfs/farm/g/desc/u1/Pipeline-tasks/PhoSim-deep-pre3/phosim_output/000000 with the same organization as described above.

SimonKrughoff commented 7 years ago

O.K. In looking a little more, the checkpointed version did not do significantly better (as you pointed out @sethdigel). I put together a mosaic of the bottom left raft. 7/9 chips were simulated successfully. I think the bright source in the upper left right chip is the bright source in Seth's plot. You will also notice the vignetting in the two chips furthest from the boresite. I'm not sure I gain much insight from this, unfortunately.

@TomGlanzman is it worth cutting the integration time by an order of magnitude to see if we can get a few of these chips to finish?

image

SimonKrughoff commented 7 years ago

On further inspection, I am seeing some really strange dark blotches in these chips. Here is a section of one of the chips (R10_S11), but I see it in all seven of them. Has anyone seen this? It's very strange and I haven't seen it in phosim images before, as far as I can remember.

Maybe this is a new ticket.

image

sethdigel commented 7 years ago

That's great, @SimonKrughoff. I think that you mean the bright source in the upper right chip is the bright star in the plot.

Regarding the boxy distribution of the phosim runs that reached the CPU time limit, this is definitely due to how trim.cpp (run by phosim) decides which sources to include in the trimcat file. It selects a magnitude-dependent 'buffer' size (a number of pixels), and a source is included in the trimcat if its x and y offsets from the chip are both within that buffer size. The buffer size is probably selected to be conservatively large - so I would not necessarily expect that making the selection based strictly on angular distance would be better in terms of fidelity - but it does seem that being well out in the 'halo' of a bright star means a lot of computation time (maybe because photons can't be bunched?) even if not very many photons from the star necessarily end up in the chip.

jchiang87 commented 7 years ago

Yep, I've seen these blotches in the Run1.1 data in the warped images used for the forced photometry. I had assumed they were a by-product of the warping and image repair.

Regarding the shorter execution times for sensors that have the bright stars right on them vs longer execution times for the ones in the Mie scattering tails, I'm wondering if the former are shorter because the approximations that group photons for saturated sources are in effect, whereas for the sensors in the tails, each photon is followed individually.

drphilmarshall commented 7 years ago

@sethdigel I wonder if an even larger buffer size is required?

@SimonKrughoff @jchiang87 What are the pixel values in the dark patches? They look like they could be pixels being zero'ed -or masked- out... Do they coincide with any objects in our other instance catalogs? I could imagine some bug where the position of some past or future supernova was somehow leaking into the present eimage...

SimonKrughoff commented 7 years ago

First, @sethdigel and @jchiang87 I was thinking the same thing about if this could be related to when and where the bright star approximations are valid. Maybe we could simply crank down the buffer.

Second, the dark blotches are due to crosstalk. This is going to be a big problem. I don't know if we can turn off crosstalk, but I think we should try. I can make ISR take out crosstalk, but from my limited looking, these crosstalk terms are far too large (~0.0035).

I should mention that the crosstalk matrix was diagonal in 3.4.2, so this is new. I think that means the other blotches you were seeing, @jchiang87, may have been due to something else.

jchiang87 commented 7 years ago

Based on the default segmentation.txt file: https://bitbucket.org/phosim/phosim_release/src/39f267c3f9733a490de85231cd7ac6c5e7154ebc/data/lsst/segmentation.txt?at=master&fileviewer=file-view-default The crosstalk off-diagonal entries are indeed that large. I think we should turn off crosstalk, either by zeroing the off-diagonal entries in the matrix or via some override option.

drphilmarshall commented 7 years ago

Oops, our comments crossed, Simon. Yes, crosstalk - I wasn't sure if PhoSim knew about this! Cool that it does (in a way). Are the terms large because the covariance matrix is somehow unrealistic, or just because there are some very* bright stars in the other images? I'm happy for us to turn off crosstalk in DC1 if we can, on the grounds that DM ISR is not ready yet.

SimonKrughoff commented 7 years ago

@jchiang87 you're right. I was thinking they were an order of magnitude smaller than they are. We could check with our camera friends, but I think that's much larger than DECam sees and what we expect.

drphilmarshall commented 7 years ago

@SimonKrughoff I would expect this to be validated against the camera data already, no? Worth asking though - who should we @mention to get them into this thread?

jchiang87 commented 7 years ago

I'm pretty sure that we haven't had a chance to measure intrasensor crosstalk for production sensors yet using LSST electronics. We have measurements of prototype devices using third party electronics (controllers, etc.), but it was clear that the cabling was introducing spurious crosstalk signal.

SimonKrughoff commented 7 years ago

Maybe Andy R. would know, but I don't see him as an option.

For what's it's worth, I wouldn't mind adding crosstalk to the LsstSim ISR if we need it. We are already doing it for DECam and HSC. On the other hand, I'm always a little worried about making work for ourselves.

jchiang87 commented 7 years ago

btw, I'm fairly certain now that the blotches I saw were for eimages generated with phosim 3.5.2 (or later), so they were not in Run1.1 images, afaik.

SimonKrughoff commented 7 years ago

O.K. so not too large by a factor of 10.

From OSS-REQ-0327: image

sethdigel commented 7 years ago

@sethdigel I wonder if an even larger buffer size is required?

@drphilmarshall, I don't know but I suspect that the buffer size is actually conservatively large. The 'buffer' for stars as bright as the brightest two in this pointing (~7th mag) extends about 10k pixels, or about 2.5 CCDs (which means that a 7th magnitude star can end up in the trimcat files for about 25 CCDs). Here is an image for the R10_S22 sensor (upper right in the lower-left raft). I've smoothed it with a Gaussian kernel of 5 pixel radius and squashed the scaling to try to see the tails of the PSF around the bright star. I'm probably revealing too much ignorance, but I am not getting the impression that the star has much influence even one CCD over.

r10_s22_scaled

sethdigel commented 7 years ago

For PhoSim-deep-pre2, Tom also made runs for a second pointing, OpSim ID 0921297. I've made the same CPU time and bright star plot for this pointing. I was more interested in it before I realized that the RA, Dec of the pointings was exactly the same, but for the record, here's the plot; the e-image files are in /nfs/farm/g/desc/u1/Pipeline-tasks/PhoSim-deep-pre2/phosim_output/000001. The observations differ only a little in rotation angle (154 deg vs. 147 deg), zenith angle (27 deg vs. 21 deg) and moon altitude (-0.6 deg vs. -6.9 deg) for 1668469 vs. 0921297. So the plot ended up looking fairly similar. I've included more stars around the edges of the focal plane ha may not have made it into any of the trimcats. Also, for four of the CCDs that reached the CPU time limit, the CPU time reported in the pipeline log file was extremely small, indicating a glitch somewhere.

show_cpu_fp_0921297

SimonKrughoff commented 7 years ago

@sethdigel I don't know if it was a misspeak, but I think the upper right in the lower left would be R10_S22 not R11_S22. Edit: actually, I think I do recognize that as R10_S22. Is this an eimage or is it a processed image?

There is a gradient in the noise in the image you posted. It would be nice to know where that is coming from if it's not from a bright star.

johnrpeterson commented 7 years ago

@sethdigel I wonder if an even larger buffer size is required?

@drphilmarshall, I don't know but I suspect that the buffer size is actually conservatively large. The 'buffer' for stars as bright as the brightest two in this pointing (~7th mag) extends about 10k pixels, or about 2.5 CCDs (which means that a 7th magnitude star can end up in the trimcat files for about 25 CCDs). Here is an image for the R11_S22 sensor (upper right in the lower-left raft). I've smoothed it with a Gaussian kernel of 5 pixel radius and squashed the scaling to try to see the tails of the PSF around the bright star. I'm probably revealing too much ignorance, but I am not getting the impression that the star has much influence even one CCD over.

nope, it definitely can put photons on other CCDs. here is why:

SimonKrughoff commented 7 years ago

@johnrpeterson I don't think anyone is arguing that that photons from the bright star aren't landing on another CCD. I think it's only a question of whether we care enough about that contribution to spend the simulation time.

We have a guess that stars more than ~1 chip away do not get the same bright star optimization as do stars close to or on the chip. Is that guess correct?

sethdigel commented 7 years ago

@sethdigel I don't know if it was a misspeak, but I think the upper right in the lower left would be R10_S22 not R11_S22.

Yes, it was a typo. I caught it and fixed it in my posting last night.

drphilmarshall commented 7 years ago

@johnrpeterson I think your comment got mangled somehow - the explanation of (or link to?) why photons can end up on faraway chips got lost. Would you mind re-posting please? Thank you!

johnrpeterson commented 7 years ago

yes, the optimization works differently for different chips. it should go faster for the non-core chips, but its rather complicated. how many chips it will use depends on the magnitude of the object.

On Sep 29, 2016, at 10:34 AM, SimonKrughoff notifications@github.com<mailto:notifications@github.com> wrote:

We have a guess that stars more than ~1 chip away do not get the same bright star optimization as do stars close to or on the chip. Is that guess correct?

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

johnrpeterson commented 7 years ago

i just meant to look at the pretty picture-- there is both large angle scattering due primarly to scratches on the mirrors as well as ghosts (double reflections off optical surfaces) that cause particularly large angles.

On Sep 29, 2016, at 10:43 AM, Phil Marshall notifications@github.com<mailto:notifications@github.com> wrote:

@johnrpetersonhttps://github.com/johnrpeterson I think your comment got mangled somehow - the explanation of (or link to?) why photons can end up on faraway chips got lost. Would you mind re-posting please? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/315#issuecomment-250486997, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8njgOZNAte_BoAISFG9tRYnsL0Q0ks5qu86SgaJpZM4J-buj.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

SimonKrughoff commented 7 years ago

yes, the optimization works differently for different chips. it should go faster for the non-core chips, but its rather complicated. how many chips it will use depends on the magnitude of the object.

It would be nice to understand exactly what the complications are, but it sounds like we should expect very long run times for chips that have stars near by but not on the chip. If we want to forego these second order contributions to the background, we can clamp down the buffer region to get significantly better runtimes.

i just meant to look at the pretty picture-- there is both large angle scattering due primarly to scratches on the mirrors as well as ghosts (double reflections off optical surfaces) that cause particularly large angles.

I don't see an image. I'm not sure github handles email attachments correctly.

drphilmarshall commented 7 years ago

"Scratches on the mirror" - I love PhoSim :-)

I guess we are still learning what things like this look like in the images. John, which of the PIN docs at https://confluence.lsstcorp.org/display/PHOSIM/Detailed+Documentation do you recommend for learning more about the perils / treatment of bright stars?

If we did "clamp down the buffer region" would there be any other ill effects other than artificially removing the ghosts?

johnrpeterson commented 7 years ago

Phil and Simon-

Glad you like it. The picture is here if you didn’t get it:

https://www.dropbox.com/s/q94l6mc5eg8eiuq/PastedGraphic-1.pdf?dl=0

There isn’t a specific document on bright stars.

Its not a viable option to “clamp down on the buffer region” though. I am misrepresenting what is going on a little bit by the above image because that is an extreme case (0 magnitude, i think). The buffer is necessary for even simple things like a star in the gap or at the edge, which would be far more common than the details of the ghosts. You’ll cause all sorts of problems if you remove or shrink the buffer and risk stars being removed for random reasons. This part is pretty carefully done, and is not overly conservative.

So you’ll either have to decide whether to remove bright stars at some magnitude or not. The number i remember is that there is at least a 12th magnitude star on every chip. so you at least want to keep above that amount. Note that the image above shows that you can do a 0th magnitude star, so it is possible to do all stars. Its just if they are causing someone headaches then you could remove some, but i really don’t like the idea of removing any personally...

john

On Sep 29, 2016, at 12:16 PM, Phil Marshall notifications@github.com<mailto:notifications@github.com> wrote:

"Scratches on the mirror" - I love PhoSim :-)

I guess we are still learning what things like this look like in the images. John, which of the PIN docs at https://confluence.lsstcorp.org/display/PHOSIM/Detailed+Documentation do you recommend for learning more about the perils / treatment of bright stars?

If we did "clamp down the buffer region" would there be any other ill effects other than artificially removing the ghosts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/315#issuecomment-250514986, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8uQ3aApqfqSUg4iHAsuxb2kbaPlxks5qu-RWgaJpZM4J-buj.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

sethdigel commented 7 years ago

The image below shows the part of the focal plane around the bright star at the lower left (992878567429 in the instance catalog). The position of the star is indicated by the green symbol. The numbers in the boxes are the numbers of photons that phosim reports having detected from this star. The information comes from the centroid files. The peak position, in R10_S22, has 3,611,211,732 photons. Again, the crosses mark the positions of the centers of the CCDs, and black diamonds mark CCDs for which the phosim job reached the CPU time limit before finishing.

show_cpu_fp_phot_1668469

For the non-black diamond positions, if no number is shown, then no photons were reported as having been detected from this star. (This also likely means that the star was not actually in the trimcat.)

The dynamic range is fairly large, more than a factor of 1000. The numbers of photons from the bright star in the CCDs for which the phosim runs reached the CPU time limit probably would have been in the same range as for other outlying CCDs.

I can't say whether the 3-30 million photon counts in the outlying CCDs are few enough that not including them would be ok. They are presumably spread fairly broadly. Not counting photons from the bright star, the centroid files indicate that each CCD is detecting about 12.5e9 photons.

Roughly speaking, including this single star in the trimcat files for sensors more than 1 CCD away from the position of the star cost about 25% of the overall CPU time for the entire run, and it would have been more if the jobs had not timed out at 5 CPU days.

SimonKrughoff commented 7 years ago

Now I'm baffled. Right is supposed to be without crosstalk and left is with and they both show the divots. I'll have to look more tomorrow.

image

jchiang87 commented 7 years ago

I'm wondering now if these patches really are crosstalk. The crosstalk matrices are indeed read-in in the e2adc code:

https://bitbucket.org/phosim/phosim_release/src/39f267c3f9733a490de85231cd7ac6c5e7154ebc/source/e2adc/e2adc.cpp?at=master&fileviewer=file-view-default#e2adc.cpp-127

These blotches on the eimages must be something else.

sethdigel commented 7 years ago

Also if they were due to cross talk, I think we would have expected some large, obvious holes due the brightest stars.

Over the summer, Tom duplicated the original Twinkles Run 1 (phosim v3.4.2) with phosim v3.5.2. The inputs were exactly the same. So we have a complete set of images that can be compared. At SLAC the original images are in subdirectories of /nfs/farm/g/desc/u1/Pipeline-tasks/Twinkles-phoSimII/phosim_output/ and the v3.5.2 images are under /nfs/farm/g/desc/u1/Pipeline-tasks/Twinkles-phoSim-352/phosim_output/ Here's a comparison for a visit chosen at random, v.3.5.2 on the left.

phosim_compare

drphilmarshall commented 7 years ago

So v3.5.2 makes negative divots in the images! That is not cool. I can't see any closed or open issues at https://bitbucket.org/phosim/phosim_release/issues - perhaps we should make one? That comparison image is fantastic, Seth. Have you seen this before, @johnrpeterson ?

SimonKrughoff commented 7 years ago

Unfortunately, I don't really have much more time to dig into this. At this point, I am going to suggest that we cut our losses with v 3.5.2 and go with v3.4.x for DC1. Is there a killer feature we get with v3.5.2 that we can't do without?