Closed plaszczy closed 3 years ago
That was confirmed by @fjaviersanchez :it was eliminated because of the "good" selection. because of ovjects that had 'base_PixelFlags_flag_clipped'. It is not present on DR2.
see https://lsstc.slack.com/archives/CM6MF33UG/p1607440268069500 for more on this. I tried to track down a pathological calexp but failed. In the meantime it appears that the coaddDriver outputs have inconsistent timestamp, so I am going to rerun it, keeping the warps but removing the final products so that they possibly get updated.
Thanks, @johannct . I am very interested in the results of a re-run. The special behavior of this patch makes me think that there was a temporary configuration problem, or that this patch was run with a different configuration than all the others.
Before setting the coaddDriver output I just chcked the supreme exposure map for 4852, and it looks OK, so whatever happened to this patch is of a different nature than the previously spotted failures
@fjaviersanchez found: """ the hole there in DR6 corresponds to objects with 'base_PixelFlags_flag_clipped' that have been eliminated when you apply the good cut (btw, I also checked DR2 and there's no hole there when you apply the good cut). Below is the plot of DR6 galaxies in the tract that @stefplaz mentioned with base_PixelFlags_flag_clipped==True : """
@fjaviersanchez continues: """ (my map is flipped with respect to Stéphane's)
and a higher resolution zoom-in: """
@johannct notes that: """ update : looking at the occurrence of the value True for this flag in 'deepCoadd_meas' I get u, 32 g, 0 r, 26 i, 29134 z, 32 y, 27 So of course it is the reference band which is pathological..... """
""'" Looking at deepCoadd images it seems indeed that the mask for coadd i image includes an abnormal amount of pixels where CLIPPED (10) and APPROXIMATE PSF (12) are set.... """
@johannct What numbers do you get for the occurrence of base_PixelFlags_flag_clipped
in the next patch over? E.g., tract 4852, patch 1,4.
Recall this issue: https://github.com/LSSTDESC/DC2-production/issues/400 where due to the processing at NERSC where we tried to reuse the existing warps from CC, it was found a number of warp files appear to be corrupt. Given the datestamps on the files, it seemed connected with the disk filling up at CC back in May. While I haven't completely finished, I do have a list of bad warps. ~A 130 of them in the i band for tract 4852, include patch 1,5 and 139 in patch 1,4 There are many other bad warps in 4852 for some other patches. I do wonder if these warps should be regenerated. I started to collect a list here at CC: /sps/lsst/users/hkelly/dr6-warps-checks
There's a specific list for 4852 1,5 here: /sps/lsst/users/hkelly/dr6-warps-checks/bad-i-tract-4852-1,5.out
~
Sorry those 130 were the good ones :) 4852 in i band actually looked fine
Hmm as far as I can tell there are 130 warp files in your list, and 130 warps in the rerun directory.... so that would mean that they are all bad somehow? It would be good to understand exactly what is wrong with them then, because fitsinfo does not complain for any of them.
For many of the corrupted files, one needs to explicitly read in and access the data section of one of the image extensions to see an error. I'd be surprised if the files were corrupted at the time of the coadd generation since things would have crashed, but it's worth following up on. Regardless, corrupted files should be identified and moved out of the way.
yes I was hoping that supreme was doing that as well...... I do not understand. Ok so if you confirm that they should all be deleted I will remove them and relaunch coaddDriver a third time.
See updated comment :) There are some bad warps, but not in 4852 i band. I'm going through the logs again though to see if the other bands show anything for that particular tract.
In [149]: d=[]
...: for f in ['u','g','r','i','z','y']:
...: id={'tract':4852,'patch':'1,4','filter':f}
...: dd=butler.get('deepCoadd_meas',dataId=id)
...: print('{} {}'.format(f,len(np.where(dd['base_PixelFlags_flag_clipped']==True)[0])))
...: d.append(dd)
...:
...:
u 28
g 13
r 18
i 29
z 26
y 19
In [150]: d=[]
...: for f in ['u','g','r','i','z','y']:
...: id={'tract':4852,'patch':'1,5','filter':f}
...: dd=butler.get('deepCoadd_meas',dataId=id)
...: print('{} {}'.format(f,len(np.where(dd['base_PixelFlags_flag_clipped']==True)[0])))
...: d.append(dd)
...:
...:
u 32
g 0
r 26
i 29134
z 32
y 27
Has anybody opened all the warps? If an extension was all zeros or something that would be an issue and still be a valid fits file. Meanwhile, suprême does not open the individual warps in the mode that I ran on DC2, because this was orders of magnitude too slow due to where the warps live on the file system.
The test code snippet I proposed opens each image extension and accesses the shape of the data arrays. That's sufficient to trigger the same error that we saw in the coaddition task. It didn't look at any array values.
I think we need to look at the values themselves, because this is setting the CLIPPED flag so presumably there are numbers here but they are crazy outliers.
I'll have a look.
Hmmm I took the last one randomly, to check how the code would go :
In [1]: from astropy.io import fits
In [2]: hdu=fits.open('rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd/i/4852/1,5/warp-i-4852-1,5-995008.fits')
In [11]: hdu[1].shape,np.min(hdu[1].data),np.max(hdu[1].data)
Out[11]: ((4200, 4200), nan, nan)
There can be nans if they're masked, I think you need np.nanmin
In [15]: hdu[1].shape,np.nanmin(hdu[1].data),np.nanmax(hdu[1].data)
Out[15]: ((4200, 4200), -459.5638, 131863.9)
There are two questions:
ok looping now.....
@wmwv I think we are in paranoid check mode here, we turn all the stones
In [18]: for file in files :
...: hdu=fits.open(file)
...: print(hdu[1].shape,np.nanmin(hdu[1].data),np.nanmax(hdu[1].data))
...:
(4200, 4200) -221.08604 133905.1
(4200, 4200) -214.76836 139060.38
(4200, 4200) -366.0075 132893.77
(4200, 4200) -228.26071 127752.06
(4200, 4200) -359.60248 123011.17
(4200, 4200) -410.39493 123648.72
(4200, 4200) -287.60092 134028.53
(4200, 4200) -297.4834 142707.14
(4200, 4200) -343.23798 134039.9
(4200, 4200) -333.2711 94684.02
(4200, 4200) -429.60797 134553.14
(4200, 4200) -240.8877 139433.05
(4200, 4200) -253.50287 132108.03
(4200, 4200) -313.1606 86863.28
(4200, 4200) -227.18558 137615.8
(4200, 4200) -313.3414 138924.62
(4200, 4200) -228.28221 138908.22
(4200, 4200) -233.07065 136456.36
(4200, 4200) -238.40208 140100.1
(4200, 4200) -359.60675 131295.97
(4200, 4200) -297.1659 130965.8
(4200, 4200) -296.82626 122973.234
(4200, 4200) -389.96426 126481.37
(4200, 4200) -209.39848 134179.42
(4200, 4200) -220.55402 135111.73
(4200, 4200) -225.01671 129270.61
(4200, 4200) -241.86803 136417.08
(4200, 4200) -263.54977 122629.26
(4200, 4200) -232.34193 123612.34
(4200, 4200) -242.83937 138155.8
(4200, 4200) -359.29358 139312.73
(4200, 4200) -384.5748 138924.84
(4200, 4200) -198.96458 997.3174
(4200, 4200) -327.1076 141224.86
(4200, 4200) -459.5638 131863.9
(4200, 4200) -218.9993 143118.56
(4200, 4200) -355.44714 131582.31
(4200, 4200) -335.10596 121496.414
(4200, 4200) -220.7141 141601.34
(4200, 4200) -224.05415 145167.67
(4200, 4200) -235.33553 139298.52
(4200, 4200) -348.3461 130177.41
(4200, 4200) -244.94116 113448.625
(4200, 4200) -407.3234 141324.77
(4200, 4200) -344.74213 99353.83
(4200, 4200) -279.4023 126072.96
(4200, 4200) -375.81897 136397.64
(4200, 4200) -384.42545 130244.15
(4200, 4200) -330.38043 141549.83
(4200, 4200) -285.0421 133313.84
(4200, 4200) -224.46022 147216.33
(4200, 4200) -342.903 132111.72
(4200, 4200) -221.27449 135182.61
(4200, 4200) -223.57861 129118.28
(4200, 4200) -245.40088 136870.05
(4200, 4200) -226.03519 124667.016
(4200, 4200) -365.07425 134581.2
(4200, 4200) -358.88992 131584.75
(4200, 4200) -361.09113 126800.15
(4200, 4200) -408.5616 137013.6
(4200, 4200) -276.30737 131298.9
(4200, 4200) -234.86754 121050.64
(4200, 4200) -230.76772 143862.1
(4200, 4200) -405.2909 135016.33
(4200, 4200) -383.11908 136644.48
(4200, 4200) -362.63193 130828.805
(4200, 4200) -361.85605 104724.445
(4200, 4200) -298.3697 139430.66
(4200, 4200) -349.76886 135952.77
(4200, 4200) -234.52693 131596.16
(4200, 4200) -450.40366 127424.664
(4200, 4200) -266.98944 148013.55
(4200, 4200) -239.36635 134346.94
(4200, 4200) -405.66418 139318.78
(4200, 4200) -407.34253 128650.664
(4200, 4200) -270.9116 132728.92
(4200, 4200) -232.91992 137628.5
(4200, 4200) -380.22607 139010.3
(4200, 4200) -390.6479 135483.84
(4200, 4200) -430.38373 103387.22
(4200, 4200) -369.58444 136016.9
(4200, 4200) -236.92389 133908.39
(4200, 4200) -343.90466 114249.07
(4200, 4200) -234.69402 135889.78
(4200, 4200) -275.37216 133662.61
(4200, 4200) -231.78294 139231.02
(4200, 4200) -306.9835 143125.11
(4200, 4200) -430.68283 135528.64
(4200, 4200) -256.94614 132847.14
(4200, 4200) -283.16025 142584.52
(4200, 4200) -253.36618 138652.23
(4200, 4200) -241.82642 122275.39
(4200, 4200) -241.08351 127364.86
(4200, 4200) -335.7297 128853.19
(4200, 4200) -269.1661 135810.47
(4200, 4200) -361.54834 126647.58
(4200, 4200) -243.64369 132435.92
(4200, 4200) -310.02405 145569.95
(4200, 4200) -303.22217 133875.17
(4200, 4200) -325.29352 134571.8
(4200, 4200) -358.8536 131591.53
(4200, 4200) -361.1473 133469.12
(4200, 4200) -298.18802 7121.0894
(4200, 4200) -413.1947 129937.836
(4200, 4200) -219.54723 131380.53
(4200, 4200) -329.9853 124610.19
(4200, 4200) -215.26097 127490.734
(4200, 4200) -215.66096 128880.81
(4200, 4200) -344.53 131975.08
(4200, 4200) -190.59291 49608.266
(4200, 4200) -263.8326 133168.27
(4200, 4200) -221.83551 136502.84
(4200, 4200) -333.5346 131394.84
(4200, 4200) -212.17023 127306.88
(4200, 4200) -230.33888 131760.94
(4200, 4200) -242.73006 135753.66
(4200, 4200) -368.48984 130084.7
(4200, 4200) -210.56142 136671.72
(4200, 4200) -219.70772 134464.67
(4200, 4200) -234.63768 136564.45
(4200, 4200) -207.26266 99546.4
(4200, 4200) -225.53851 143131.39
(4200, 4200) -308.30362 143474.28
(4200, 4200) -323.02667 124720.05
(4200, 4200) -275.3604 129618.72
(4200, 4200) -217.21942 139432.89
(4200, 4200) -301.8557 133558.06
(4200, 4200) -258.93634 146225.88
(4200, 4200) -223.41513 134532.22
(4200, 4200) -351.75363 136545.86
@wmwv I think we are in paranoid check mode here, we turn all the stones
Agreed. The discussion so far had been focused on finding the bad image (item 1). My point was to also encourage consideration of Item 2, what went on in the coadd config.
so a priori there is no blatant issue with one of the warps
@johannct Can you also check extensions 2 and 3? the variance and mask extensions?
extension 3 has nanmax systematically set to 'inf'
I generally agree that there isn't a blatant issue, and I think that @wmwv makes a very good point that whatever is going wrong is propagating to all the pixels in the coadd, which is hard for one image to do! I wonder if it's also possible to rerun the coadd with the existing warps and see if the problem is still there? Presumably we wouldn't have to even run source detection or multiband, just look at the coadd mask plane.
Extension 3 is ...?
mask? according to @jchiang87
or variance, I can't remember which offhand, but the variance values should be obvious.
@erykoff , this is already running
or variance, I can't remember which offhand, but the variance values should be obvious.
typically (4200, 4200) 4162.3 inf for ext 3 typically (4200, 4200) 0 3104 for ext2
Then 2 must be mask, 3 is inverse variance?
I do not know how a config could change for a single patch out of the blue..... @wmwv which datasetType would you look at? There is no deepCoadd_meas_config
but there is a deepCoadd_forced_config
There is one image that has a very different range of pixel values:
(4200, 4200) -198.96458 997.3174
Most max pixel values are ~100k. Maybe this is an outlier frame worth looking at?
This is what the mask plane looks like on 1,5 (the bad one) and 1,6 (neighboring okay) from the repo at nersc. The bit 2**14=16384 is the "clipped" bit. And it is set almost everywhere on 1,5 and not at all on 1,6, and not following any of the input images. So something went 🤪 here.
This is from, e.g, /global/cfs/cdirs/lsst/production/DC2_ImSim/Run2.2i/desc_dm_drp/v19.0.0-v1/rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd/i/4852/1,5.fits
. Don't need to look at the sources/run multiband to see the problem.
The pattern on the left indicates that it is a single image that is causing that bit to be set.
The bit is set on both sides of a chip gap ... maybe a single visit, but not a single PVI/calexp/warp, no?
Good catch @jchiang87 here is the case you spotted is rerun/run2.2i-coadd-wfd-dr6-v1-grizy/deepCoadd/i/4852/1,5/warp-i-4852-1,5-893769.fits
God switched on a light bulb....
Yes, a single visit, but that would still correspond to a single warp image that combines the different PVI that overlap with it.
@jchiang87 Ah yes, duh. @johannct seems problematic.
@johannct I think that image is actually not the culprit. That looks like a visit where only a small corner of the warp was covered by a CCD. The one we want would look like the pattern on the coadd with that clipped bit set.
this is a bit tougher.... I have no better way than to open them all
I think that the suprême
input map can help here. Give me a moment...
I think the pixels in chip gaps should all be nan-valued in the image itself, so the number of nans would match or be close to the number of non-clipped pixels in the coadd...could try comparing those numbers to see....
Another test is that there are at least 5 CCDs contributing to that warp. 4 or fewer is more typical. I think the number of contributing CCDs is in the warp headers somewhere.