Closed plaszczy closed 3 years ago
I did not find it
Looking at the tracts_mapping.sqlite3 file, there are 33 visits with > 4 ccds contributing in i-band. Here they are
919561 5
945676 5
420877 5
211473 5
919601 5
665665 6
1157699 5
437323 5
1185891 5
665703 6
678509 5
685691 5
994964 5
994965 5
906938 6
995008 5
211148 5
211179 5
458497 5
204555 5
965400 5
269088 5
1000776 5
1000781 6
269134 5
1209684 5
1212791 5
434049 5
1165737 5
518059 5
1209263 5
420826 5
200191 5
with the number of CCDs contributing in the second column.
The offender is 665703 according to the supreme input map.
Where do the warps live at nersc?
They would be here:
/global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd/i/4852/1,5
but that folder is empty. The originals, if they are still around, would be at CC-IN2P3.
I thought that @heather999 put them somewhere else? Or maybe I'm mis-remembering. But if we don't have the warps at nersc, I can't do anything else at the moment, and hopefully @johannct will notice something strange with 665703.
Seems kind of empty, but maybe the scale ... does it look strange compared to another visit warp?
not really.....
fwiw, here's the tracts_mapping info on that warp:
id tract patch visit detector filter layer
194 6720042 4852 (1, 5) 665703 133 i
195 6720045 4852 (1, 5) 665703 134 i
196 6720081 4852 (1, 5) 665703 141 i
197 6720201 4852 (1, 5) 665703 163 i
198 6720205 4852 (1, 5) 665703 164 i
199 6720245 4852 (1, 5) 665703 171 i
Having looked at a few of these calexps, thing seem fine, but I only looked at the image data.
/global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/rerun/run2.2i-calexp-v1/calexp/00665703-i
@johannct Which SRS log file corresponds to the creation of that warp? There may be clues there.
/sps/lsst/users/descprod/Pipeline2/Logs/DC2DM_DRP/2.9/task_coadd/task_coadd_tract_patch/task_coaddDriver/run_coaddDriver/024/043/007/001/logFile.txt
But if we don't have the warps at nersc,
We explicitly decided that we didn't need to keep these intermediate products at NERSC.
Right! We have the calexps somewhere but not the warps (or the wasps for that matter). @johannct can you put the warps for 665703 and, say, 174550 some place on nersc scratch so that I can poke at them?
both in /global/cscratch1/sd/erykoff/johannct for others than Eli who want to look.
Will need the mode changed to 666 and we're good to go. Thanks!
The offender is 665703 according to the supreme input map.
Was this statement based just on a visual recognition of the pattern?
I looked at the pattern of all of the input visits and that was the only one that matched.
Hmmm... I ask because, as you're likely similarly looking at right now, there doesn't seem to be anything obviously wrong with that warp.
The weight map pattern looks very different between the two. Don't know which would be "correct" I've never looked at a warp weight map, nor do I know how this could cause a problem...
The weight map pattern looks very different between the two.
Do you mean at the pixel level, or the moire pattern when zoomed out and resampled?
Oh I have an idea. There was a bug in the coadd code that was fixed I'm pretty sure after this processing that the psf matched warps would screw up the scaling if the coadd had a bad psf. And how did it have a bad psf? It was choosing the center of the warp or something like that. And there's no psf model at the center of the warp for 665703.
Hmm... interesting.
The psfMatchedWarp images look to have the same scaling here.
@erykoff Very interesting! Though I'd be surprised if this was the only warp in all of DC2 where this occurred.
But you're saying that there's some step that asks about the PSF model of the warp and that selects the center to be typical. So the bug is after the psfMwatchedWarp generation and part of the combining into the coadd?
Okay, completely spit-balling here. But maybe the bug is as @erykoff identified above, then combined with the mask setting having some bug where the fractional threshold to propagate a mask pixel to the coadd mask value behavior differently if there are 5 images or more.
Ah, I was wrong, it's the other way around: https://lsstc.slack.com/archives/C2JPXB4HG/p1590785611140900 . The problem is that when making the coadd psf for detection things can go into bad regions. But this is something different, but I fear it might be related. But then again, you'd think there would be other warps where this happened! So I think it's a coincidence.
Because I agree with @jchiang87 's point that the number of times one gets the PSF in the center to not be a valid PSF should go as something like the fractional area of the chip gaps.
The 5 images or more is in terms of the stacked depth in the coadd and there are >> 5 images everywhere here.
Oh, sorry, I'm an idiot. I totally misunderstood @jchiang87 table above. He was counting number of CCDs contributing to the warp for a given visit. That makes sense.
@erykoff What does the PSF model look like for those warps?
I have great confidence in your ability to assess whether nor not a given PSF model is good. But if it's otherwise useful, I grabbed a few more warps and put them on NERSC in /global/cscratch1/sd/wmwv/DC2_Run2.2i/debug_bad_warp
if you want to compare.
(These were just some that I had grabbed earlier and was looking through.)
I have no confidence in my ability to make images of stack psf models...but I can say that the size of the psf for the bad image is smaller? But now I'm totally fishing and nothing is obviously fishy.
Here are images of the psfs from the calexps for all six sensor-visits contributing to that warp:
These are evaluated at the center of the ccd. And the output of
psf.computeShape()
for each:
133 (ixx=1.99657594373448, iyy=2.050492953217776, ixy=-0.011098628584589033)
134 (ixx=2.019505530391281, iyy=2.101699763461495, ixy=-0.006890655147915128)
141 (ixx=2.029050756326062, iyy=2.083832430512826, ixy=-0.0032894101846721816)
163 (ixx=1.9745718082384986, iyy=2.053081621210872, ixy=-0.005349000670642242)
164 (ixx=2.0043181345859926, iyy=2.057558593012063, ixy=-0.002385396921082846)
171 (ixx=2.0112337378367817, iyy=2.1142107505264067, ixy=0.0011369217832696897)
The first column is the detector number. The values all look similar to other CCDs in this visit.
coaddDriver finished: [tanugi@cca001 v19.0.0-v1]$ ls -ltr rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd/i/4852/1,5.fits -rw-rw-r-- 1 descprod lsst 210792960 May 7 2020 rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd/i/4852/1,5.fitsSAVE -rw-rw-r-- 1 descprod lsst 3476160 May 7 2020 rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd/i/4852/1,5_nImage.fitsSAVE -rw-rw-r-- 1 descprod lsst 210816000 Dec 10 17:37 rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd/i/4852/1,5.fits -rw-rw-r-- 1 descprod lsst 2799360 Dec 10 17:37 rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd/i/4852/1,5_nImage.fits [tanugi@cca001 v19.0.0-v1]$ ls -ltr rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd-results/i/4852/1,5 total 519708 -rw-rw-r-- 1 descprod lsst 20160 May 7 2020 bkgd-i-4852-1,5.fitsSAVE -rw-rw-r-- 1 descprod lsst 8994240 May 7 2020 det-i-4852-1,5.fitsSAVE -rw-rw-r-- 1 descprod lsst 211633920 May 7 2020 calexp-i-4852-1,5.fitsSAVE -rw-rw-r-- 1 descprod lsst 20160 Dec 10 17:52 bkgd-i-4852-1,5.fits -rw-rw-r-- 1 descprod lsst 9048960 Dec 10 17:52 det-i-4852-1,5.fits -rw-rw-r-- 1 descprod lsst 211645440 Dec 10 17:52 calexp-i-4852-1,5.fits
So the new files are different....
Can you shoot the deepCoadd file over to nersc?
done
So ... the new 1,5.fits that @johannct just processed looks perfectly fine in the mask plane.
Here's what it looks like. So was this some sort of transient processing failure? How can we determine that?
ok....... the only thing clear is that the dates are inconsistent : warps timestamps are after coadd timestamps. The log does not seem to help at all.... This is bad.
I'll move to multiband so it is running during the night CET. I will rename the current outputs as I did for coaddDriver
I missed the conclusion (if any). was it a system glitch?
Looks like. Of an unknown kind. @heather999 1,5 is ready at CC for pickup.
I checked all the patches, it seems only this one was affected. can we close the issue?
not before the reprocessed patch is analyzed :)
to start well the year, here is the famous before/after plot. nice job.
Are we willing to use the above as sufficient validation so we can move ahead and make a DR6-v2 release?
I think so
I'd like to run a few additional checks that I usually report in https://lsst.lal.in2p3.fr/lalwiki/LSS/Run22 but that I did not performed in v1 since I got stuck on this patch problem.
I (mildly) tortured the new data and can't find any obvious flaw, so on my side its OK for release