Closed jchiang87 closed 4 years ago
The lists of unneeded y4 and y5 files are rather long, so I won't post them here. They are available at the CC-IN2P3 machines:
(lsst-scipipe-1172c30) [in2p3] pwd -P
/pbs/home/j/jchiang/dev/desc_sim_utils/work
(lsst-scipipe-1172c30) [in2p3] wc unneeded_y*.txt
9203 9203 819067 unneeded_y4_0000_0643.txt
8602 8602 765578 unneeded_y4_0643_1286.txt
9208 9208 819512 unneeded_y4_1286_1929.txt
10224 10224 909936 unneeded_y4_1929_2572.txt
9907 9907 881723 unneeded_y4_2572_3215.txt
10413 10413 926757 unneeded_y4_3215_3858.txt
13198 13198 1174622 unneeded_y4_3858_4501.txt
10284 10284 915276 unneeded_y4_4501_5144.txt
11534 11534 1026526 unneeded_y4_5144_5787.txt
14575 14575 1297175 unneeded_y4_5787_6430.txt
128 128 11392 unneeded_y4_6430_6435.txt
59485 59485 5300179 unneeded_y5_0000_0744.txt
37939 37939 3414510 unneeded_y5_0744_1488.txt
38028 38028 3422520 unneeded_y5_1488_2232.txt
45472 45472 4092480 unneeded_y5_2232_2976.txt
51118 51118 4600620 unneeded_y5_2976_3720.txt
44772 44772 4029480 unneeded_y5_3720_4464.txt
46109 46109 4149810 unneeded_y5_4464_5208.txt
55582 55582 5002380 unneeded_y5_5208_5952.txt
50983 50983 4588470 unneeded_y5_5952_6696.txt
56298 56298 5066820 unneeded_y5_6696_7439.txt
593062 593062 53214833 total
Each of the unneeded_y*.txt
files contains a list of file paths to the unneeded raw files.
To support the ingest at CC-IN2P3, I've made lists of the files that are needed to be processed:
(lsst-scipipe-1172c30) [in2p3] wc needed*.txt
75854 75854 6751006 needed_y4_0000_0643.txt
79161 79161 7045329 needed_y4_0643_1286.txt
79755 79755 7098195 needed_y4_1286_1929.txt
76231 76231 6784559 needed_y4_1929_2572.txt
77046 77046 6857094 needed_y4_2572_3215.txt
74543 74543 6634327 needed_y4_3215_3858.txt
68779 68779 6121331 needed_y4_3858_4501.txt
75708 75708 6738012 needed_y4_4501_5144.txt
72415 72415 6444935 needed_y4_5144_5787.txt
63628 63628 5662892 needed_y4_5787_6430.txt
483 483 42987 needed_y4_6430_6435.txt
77781 77781 6936499 needed_y5_0000_0744.txt
91822 91822 8263980 needed_y5_0744_1488.txt
92474 92474 8322660 needed_y5_1488_2232.txt
89891 89891 8090190 needed_y5_2232_2976.txt
84128 84128 7571520 needed_y5_2976_3720.txt
89733 89733 8075970 needed_y5_3720_4464.txt
89879 89879 8089110 needed_y5_4464_5208.txt
80536 80536 7248240 needed_y5_5208_5952.txt
84091 84091 7568190 needed_y5_5952_6696.txt
79836 79836 7185240 needed_y5_6696_7439.txt
1603774 1603774 143532266 total
@johannct
I concatenated your files separately for each year. I have 743603 entries for y4 and 860171 for y5
Do we have a plan for how to deal with the unneeded files? I don't usually want to completely delete files - but in this case, I'm certainly open to it. For now, my intent at NERSC is to move them aside into a separate y*-outsideDC2region
directory, store them on NERSC HPSS, and delete them on NERSC CFS.
For the SQLITE tracking DB, I'm assuming these unneeded files should be ignored. Agreed?
If there is space at CC-IN2P3, I would like to keep them around for a little while (maybe a couple weeks) since there are some things I'd like to investigate with them. For files at NERSC, I agree that we should archive them on HPSS in separate area and delete them from CFS. I'm not sure how we use the tracking db once files have been ingested, so I'm fine with omitting them or not. If we could add a column indicating they are "extra", that might be the most conservative way to proceed.
The tracking DB keeps track of simulated files, their location, year, region, run (2.2i, 2.1..) whether we've checked that the FITS files are all proper (satisfies FITS standards, properly closed). The DB has no function as far as DRP, ingest, etc. Fabio has been making use of it as part of the data transfers of sims files. This was a larger concern when we had instances of improper FITS files - so the data transfer tool could easily skip sensor-visits that were "bad". @villarrealas what do you think of including those extra files and marking them "extra"? My naive thought is to keep them out of the tracking DB to avoid future confusion, as these files should never have existed in the first place.
As a sanity check, I made depth maps using the y4 and y5 lists of sensor-visits inside and outside of the DC2 region. Here are the y4 maps: and the y5 maps:
Here's a comparison of the CCDs that were expected to be simulated for each Run2.2i visit versus the sensor-visits that were actually simulated, as identified by the raw files at CC-IN2P3 in
/sps/lssttest/datasets/desc/DC2/Run2.2i/sim/y[1-5]-wfd
. For each visit, the list of expected CCDs was computed using thetrim_sensors.py
code in https://github.com/LSSTDESC/desc_sim_utils/tree/u/jchiang/refactor_obs_md_handling , which allows one to use the opsim db file to get the pointing information. Each red point is the number of CCDs that are missing from the expected list for a given visit, and each blue point is the expected number of CCDs minus the number that were simulated.For y4 and y5, many more CCD raw files were generated than were expected, hence all the negative blue points. Those extra sensor-visits must lie outside of the Run2.2i simulation region, so we should move those aside before doing the DRP processing. I'll post a list of raw files to move aside at this issue.
For the record, here are the statistics of missing vs expected sensor-visits for each year: