LSSTDESC / SSim_DC1

Configuration, production, validation specifications and tools for the DC1 Data Set.
4 stars 2 forks source link

DC1 phoSim production #25

Closed TomGlanzman closed 7 years ago

TomGlanzman commented 7 years ago

This issue is intended to be a continuous log of the DC1 phoSim production at NERSC.

To start things off, a summary update of this project was given on Monday (12 Dec 2016) in the DESC-CI meeting (https://confluence.slac.stanford.edu/x/SryMCw). The initial workflow is being developed to include the following features:

Would like to get the first test runs going in the next week or so...but possibly not until after the holidays.

Stay tuned!

TomGlanzman commented 7 years ago

A brief status report was presented yesterday at the DESC-CI meeting.

     ----o----

Three preliminary full-focal plane visits have been completed at NERSC. The data are located here:

/global/projecta/projectdirs/lsst/production/DC1/DC1-phoSim-1/20170124-test/

000000 - uses an early instance catalog from Scott Daniels (nsnap 2, filter 4)

000001 - uses an instance catalog from Jim, used in his IMSIM development (nsnap 1, filter 2)

000002 - uses a new instance catalog from Scott (nsnap 1, filter 2)

Output consists of two data products per sensor: the 'electron' file (fits) and centroid file (txt) and is located in the 'output' subdirectories.

For these three test runs, I have retained both the full instance catalogs and the complete phosim /work directories in case there are questions raised about the provenance of these data.

johnrpeterson commented 7 years ago

Tom, we should review the physics override file. I don’t really see why you should be using those settings now that we are on to phosim v3.6.

also, the CPU time you have a 14 hours includes the 8 cores, so its really ~14/8 of wall time?

john

On Jan 24, 2017, at 3:38 PM, Tom Glanzman notifications@github.com<mailto:notifications@github.com> wrote:

A brief status reporthttps://docs.google.com/presentation/d/1JM2X100AMalC4qq41Unm320n4HpG2nWBaIeWcCSYICE/edit?usp=sharing was presented yesterday at the DESC-CI meeting.

 ----o----

Three preliminary full-focal plane visits have been completed at NERSC. The data are located here:

/global/projecta/projectdirs/lsst/production/DC1/DC1-phoSim-1/20170124-test/

000000 - uses an early instance catalog from Scott Daniels (nsnap 2, filter 4)

000001 - uses an instance catalog from Jim, used in his IMSIM development (nsnap 1, filter 2)

000002 - uses a new instance catalog from Scott (nsnap 1, filter 2)

Output consists of two data products per sensor: the 'electron' file (fits) and centroid file (txt) and is located in the 'output' subdirectories.

For these three test runs, I have retained both the full instance catalogs and the complete phosim /work directories in case there are questions raised about the provenance of these data.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-274931222, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8oG5hzMvSRhYNfnVVWBcCI3N0iFjks5rVmFKgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

TomGlanzman commented 7 years ago

John,

Agreed that a final review of the command/override file is needed.

Those 14 hours are 'billable cpu hours' from NERSC's perspective. Wall clock elapsed time for the entire sequence was about 2.4 hours. Note that the initial run of phosim (setupVisit step), which has a memory high-water mark of nearly 7 GB requires 3 cores just for the memory, not for the CPU power, so while we were billed for 25min, the elapsed time for that step was only ~8 minutes and only one of the three CPUs was used.

On 01/25/2017 06:51 AM, johnrpeterson wrote:

Tom, we should review the physics override file. I don’t really see why you should be using those settings now that we are on to phosim v3.6.

also, the CPU time you have a 14 hours includes the 8 cores, so its really ~14/8 of wall time?

john

On Jan 24, 2017, at 3:38 PM, Tom Glanzman notifications@github.com<mailto:notifications@github.com> wrote:

A brief status reporthttps://docs.google.com/presentation/d/1JM2X100AMalC4qq41Unm320n4HpG2nWBaIeWcCSYICE/edit?usp=sharing was presented yesterday at the DESC-CI meeting.

----o----

Three preliminary full-focal plane visits have been completed at NERSC. The data are located here:

/global/projecta/projectdirs/lsst/production/DC1/DC1-phoSim-1/20170124-test/

000000 - uses an early instance catalog from Scott Daniels (nsnap 2, filter 4)

000001 - uses an instance catalog from Jim, used in his IMSIM development (nsnap 1, filter 2)

000002 - uses a new instance catalog from Scott (nsnap 1, filter 2)

Output consists of two data products per sensor: the 'electron' file (fits) and centroid file (txt) and is located in the 'output' subdirectories.

For these three test runs, I have retained both the full instance catalogs and the complete phosim /work directories in case there are questions raised about the provenance of these data.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-274931222, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8oG5hzMvSRhYNfnVVWBcCI3N0iFjks5rVmFKgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275127619, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_9RFYkcrJi1f86ej_CzyFr6gYZJ-gvks5rV2FqgaJpZM4LNYlW.

johnrpeterson commented 7 years ago

tom, what is setupVisit though? there shouldn’t be anything that requires 7 Gbytes, if you use the “includeobj” way of structuring catalogs. remember?

john

TomGlanzman commented 7 years ago

John,

SetupVisit is the (arbitrary) name of a workflow step (and is referenced in my update report to the DESC-CI group). This workflow step sets up the necessary phoSim /work and /output directories, discovers the correct instance catalog for this visit, and then decompresses the .txt.gz file (due to https://bitbucket.org/phosim/phosim_release/issues/6/problem-interpreting-gzipped-instance). Finally phosim.py is called with the '-g condor' option to create the necessary batch files. The high water mark is just what is reported by slurm at NERSC.

For the moment, I am using instance catalogs provided by Scott which do not yet use the includeobj directive. It will take some thought to understand the best way to slice up the sky in such a way that the DC1 instance catalogs can make sensible use of multiple "pieces". As we have found no documentation on the use of this directive, perhaps you have some experience to share that would help?

  - Tom

On 01/25/2017 08:52 AM, johnrpeterson wrote:

tom, what is setupVisit though? there shouldn’t be anything that requires 7 Gbytes, if you use the “includeobj” way of structuring catalogs. remember?

john

On Jan 25, 2017, at 11:19 AM, Tom Glanzman notifications@github.com<mailto:notifications@github.com> wrote:

John,

Agreed that a final review of the command/override file is needed.

Those 14 hours are 'billable cpu hours' from NERSC's perspective. Wall clock elapsed time for the entire sequence was about 2.4 hours. Note that the initial run of phosim (setupVisit step), which has a memory high-water mark of nearly 7 GB requires 3 cores just for the memory, not for the CPU power, so while we were billed for 25min, the elapsed time for that step was only ~8 minutes and only one of the three CPUs was used.

  • Tom

On 01/25/2017 06:51 AM, johnrpeterson wrote:

Tom, we should review the physics override file. I don’t really see why you should be using those settings now that we are on to phosim v3.6.

also, the CPU time you have a 14 hours includes the 8 cores, so its really ~14/8 of wall time?

john

On Jan 24, 2017, at 3:38 PM, Tom Glanzman notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:

A brief status reporthttps://docs.google.com/presentation/d/1JM2X100AMalC4qq41Unm320n4HpG2nWBaIeWcCSYICE/edit?usp=sharing was presented yesterday at the DESC-CI meeting.

----o----

Three preliminary full-focal plane visits have been completed at NERSC. The data are located here:

/global/projecta/projectdirs/lsst/production/DC1/DC1-phoSim-1/20170124-test/

000000 - uses an early instance catalog from Scott Daniels (nsnap 2, filter 4)

000001 - uses an instance catalog from Jim, used in his IMSIM development (nsnap 1, filter 2)

000002 - uses a new instance catalog from Scott (nsnap 1, filter 2)

Output consists of two data products per sensor: the 'electron' file (fits) and centroid file (txt) and is located in the 'output' subdirectories.

For these three test runs, I have retained both the full instance catalogs and the complete phosim /work directories in case there are questions raised about the provenance of these data.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-274931222, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8oG5hzMvSRhYNfnVVWBcCI3N0iFjks5rVmFKgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275127619, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_9RFYkcrJi1f86ej_CzyFr6gYZJ-gvks5rV2FqgaJpZM4LNYlW.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275153658, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8swc1P6pVagFFN1so-kMFP92lYclks5rV3YOgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275163761, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_9RHHZWtiwhEdWCC21otAeZEEaJE6Dks5rV33qgaJpZM4LNYlW.

johnrpeterson commented 7 years ago

so, the includeobj would just be to have a separate file(s) with all the astrophysics objects. it doesn’t matter performance-wise whether there are 1 or N of these files. but its important that there is one and you put all the objects in that file.

if you do that, then phosim.py will never read that file and therefore will never load the full catalog into memory. then also the trim jobs later will only work on one line at a time. the raytrace jobs will only see a small subset of the full catalog. so that’s why when you say "7 Gbytes of memory", i know that that should never happen to any part of the phosim pipeline.

so you definitely don’t need to slice up the sky. in the past, people have put stars in one file and galaxies in another, and other objects in others, but that’s just what is convenient to the catalog creator.

make sense?

john

TomGlanzman commented 7 years ago

John,

Yes, sounds good. A question about the path: can phosim automatically look for the 'includeobj' file in the same directory as the instanceCatalog itself? Or must we embed the full path of the included file? If the latter, that means if we move files around, we will be obliged to edit the instanceCatalogs.

Scott, can we adjust the instance catalog generation to create (at least) two files: the main instance catalog with no objects (but with the necessary 'includobj' directives) + one or more files of objects? In this case, the object file could contain a union of all astrophysical objects needed by DC1, and not just those required by a specific visit. That in itself seems like a very desirable simplification.

Thanks John!

On 01/25/2017 10:37 AM, johnrpeterson wrote:

so, the includeobj would just be to have a separate file(s) with all the astrophysics objects. it doesn’t matter performance-wise whether there are 1 or N of these files. but its important that there is one and you put all the objects in that file.

if you do that, then phosim.py will never read that file and therefore will never load the full catalog into memory. then also the trim jobs later will only work on one line at a time. the raytrace jobs will only see a small subset of the full catalog. so that’s why when you say "7 Gbytes of memory", i know that that should never happen to any part of the phosim pipeline.

so you definitely don’t need to slice up the sky. in the past, people have put stars in one file and galaxies in another, and other objects in others, but that’s just what is convenient to the catalog creator.

make sense?

john

On Jan 25, 2017, at 12:29 PM, Tom Glanzman notifications@github.com<mailto:notifications@github.com> wrote:

John,

SetupVisit is the (arbitrary) name of a workflow step (and is referenced in my update report to the DESC-CI group). This workflow step sets up the necessary phoSim /work and /output directories, discovers the correct instance catalog for this visit, and then decompresses the .txt.gz file (due to https://bitbucket.org/phosim/phosim_release/issues/6/problem-interpreting-gzipped-instance). Finally phosim.py is called with the '-g condor' option to create the necessary batch files. The high water mark is just what is reported by slurm at NERSC.

For the moment, I am using instance catalogs provided by Scott which do not yet use the includeobj directive. It will take some thought to understand the best way to slice up the sky in such a way that the DC1 instance catalogs can make sensible use of multiple "pieces". As we have found no documentation on the use of this directive, perhaps you have some experience to share that would help?

  • Tom

On 01/25/2017 08:52 AM, johnrpeterson wrote:

tom, what is setupVisit though? there shouldn’t be anything that requires 7 Gbytes, if you use the “includeobj” way of structuring catalogs. remember?

john

On Jan 25, 2017, at 11:19 AM, Tom Glanzman notifications@github.com<mailto:notifications@github.commailto:notifications@github.com> wrote:

John,

Agreed that a final review of the command/override file is needed.

Those 14 hours are 'billable cpu hours' from NERSC's perspective. Wall clock elapsed time for the entire sequence was about 2.4 hours. Note that the initial run of phosim (setupVisit step), which has a memory high-water mark of nearly 7 GB requires 3 cores just for the memory, not for the CPU power, so while we were billed for 25min, the elapsed time for that step was only ~8 minutes and only one of the three CPUs was used.

  • Tom

On 01/25/2017 06:51 AM, johnrpeterson wrote:

Tom, we should review the physics override file. I don’t really see why you should be using those settings now that we are on to phosim v3.6.

also, the CPU time you have a 14 hours includes the 8 cores, so its really ~14/8 of wall time?

john

On Jan 24, 2017, at 3:38 PM, Tom Glanzman notifications@github.com<mailto:notifications@github.commailto:notifications@github.commailto:notifications@github.com> wrote:

A brief status reporthttps://docs.google.com/presentation/d/1JM2X100AMalC4qq41Unm320n4HpG2nWBaIeWcCSYICE/edit?usp=sharing was presented yesterday at the DESC-CI meeting.

----o----

Three preliminary full-focal plane visits have been completed at NERSC. The data are located here:

/global/projecta/projectdirs/lsst/production/DC1/DC1-phoSim-1/20170124-test/

000000 - uses an early instance catalog from Scott Daniels (nsnap 2, filter 4)

000001 - uses an instance catalog from Jim, used in his IMSIM development (nsnap 1, filter 2)

000002 - uses a new instance catalog from Scott (nsnap 1, filter 2)

Output consists of two data products per sensor: the 'electron' file (fits) and centroid file (txt) and is located in the 'output' subdirectories.

For these three test runs, I have retained both the full instance catalogs and the complete phosim /work directories in case there are questions raised about the provenance of these data.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-274931222, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8oG5hzMvSRhYNfnVVWBcCI3N0iFjks5rVmFKgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275127619, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_9RFYkcrJi1f86ej_CzyFr6gYZJ-gvks5rV2FqgaJpZM4LNYlW.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275153658, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8swc1P6pVagFFN1so-kMFP92lYclks5rV3YOgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275163761, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_9RHHZWtiwhEdWCC21otAeZEEaJE6Dks5rV33qgaJpZM4LNYlW.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275174764, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8npR5eMVErOyC3cGQ4hmi6rNFqtjks5rV4aTgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275193720, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_9RFEkXCyjVVVoZ8pmhf3dNHpFDxU7ks5rV5Z0gaJpZM4LNYlW.

danielsf commented 7 years ago

I'm still confused about how to use includeobj.

It sounds like I create one text file with all of my astrophysical sources (let's call it astrophys.txt)

and then I create an Instance Catlaog that looks like

rightascension 53.0091385 declination -27.4389488 mjd 59580.1397286 altitude 66.3416951 azimuth 270.2736557 filter 2 vistime 30.0000000 includeobj astrophys.txt

Is this correct (obviously I have truncated the header of this hypothetical InstanceCatalog for the sake of email)?

-- Scott

On Wed, Jan 25, 2017 at 10:48 AM, Tom Glanzman notifications@github.com wrote:

John,

Yes, sounds good. A question about the path: can phosim automatically look for the 'includeobj' file in the same directory as the instanceCatalog itself? Or must we embed the full path of the included file? If the latter, that means if we move files around, we will be obliged to edit the instanceCatalogs.

Scott, can we adjust the instance catalog generation to create (at least) two files: the main instance catalog with no objects (but with the necessary 'includobj' directives) + one or more files of objects? In this case, the object file could contain a union of all astrophysical objects needed by DC1, and not just those required by a specific visit. That in itself seems like a very desirable simplification.

Thanks John!

  • Tom

On 01/25/2017 10:37 AM, johnrpeterson wrote:

so, the includeobj would just be to have a separate file(s) with all the astrophysics objects. it doesn’t matter performance-wise whether there are 1 or N of these files. but its important that there is one and you put all the objects in that file.

if you do that, then phosim.py will never read that file and therefore will never load the full catalog into memory. then also the trim jobs later will only work on one line at a time. the raytrace jobs will only see a small subset of the full catalog. so that’s why when you say "7 Gbytes of memory", i know that that should never happen to any part of the phosim pipeline.

so you definitely don’t need to slice up the sky. in the past, people have put stars in one file and galaxies in another, and other objects in others, but that’s just what is convenient to the catalog creator.

make sense?

john

On Jan 25, 2017, at 12:29 PM, Tom Glanzman <notifications@github.com< mailto:notifications@github.com>> wrote:

John,

SetupVisit is the (arbitrary) name of a workflow step (and is referenced in my update report to the DESC-CI group). This workflow step sets up the necessary phoSim /work and /output directories, discovers the correct instance catalog for this visit, and then decompresses the .txt.gz file (due to https://bitbucket.org/phosim/phosim_release/issues/6/ problem-interpreting-gzipped-instance). Finally phosim.py is called with the '-g condor' option to create the necessary batch files. The high water mark is just what is reported by slurm at NERSC.

For the moment, I am using instance catalogs provided by Scott which do not yet use the includeobj directive. It will take some thought to understand the best way to slice up the sky in such a way that the DC1 instance catalogs can make sensible use of multiple "pieces". As we have found no documentation on the use of this directive, perhaps you have some experience to share that would help?

  • Tom

On 01/25/2017 08:52 AM, johnrpeterson wrote:

tom, what is setupVisit though? there shouldn’t be anything that requires 7 Gbytes, if you use the “includeobj” way of structuring catalogs. remember?

john

On Jan 25, 2017, at 11:19 AM, Tom Glanzman <notifications@github.com< mailto:notifications@github.com>mailto:notifications@github.com> wrote:

John,

Agreed that a final review of the command/override file is needed.

Those 14 hours are 'billable cpu hours' from NERSC's perspective. Wall clock elapsed time for the entire sequence was about 2.4 hours. Note that the initial run of phosim (setupVisit step), which has a memory high-water mark of nearly 7 GB requires 3 cores just for the memory, not for the CPU power, so while we were billed for 25min, the elapsed time for that step was only ~8 minutes and only one of the three CPUs was used.

  • Tom

On 01/25/2017 06:51 AM, johnrpeterson wrote:

Tom, we should review the physics override file. I don’t really see why you should be using those settings now that we are on to phosim v3.6.

also, the CPU time you have a 14 hours includes the 8 cores, so its really ~14/8 of wall time?

john

On Jan 24, 2017, at 3:38 PM, Tom Glanzman <notifications@github.com< mailto:notifications@github.com>mailto:notifications@github.com<mailto: notifications@github.com>> wrote:

A brief status reporthttps://docs.google.com/presentation/d/ 1JM2X100AMalC4qq41Unm320n4HpG2nWBaIeWcCSYICE/edit?usp=sharing was presented yesterday at the DESC-CI meeting.

----o----

Three preliminary full-focal plane visits have been completed at NERSC. The data are located here:

/global/projecta/projectdirs/lsst/production/DC1/DC1- phoSim-1/20170124-test/

000000 - uses an early instance catalog from Scott Daniels (nsnap 2, filter 4)

000001 - uses an instance catalog from Jim, used in his IMSIM development (nsnap 1, filter 2)

000002 - uses a new instance catalog from Scott (nsnap 1, filter 2)

Output consists of two data products per sensor: the 'electron' file (fits) and centroid file (txt) and is located in the 'output' subdirectories.

For these three test runs, I have retained both the full instance catalogs and the complete phosim /work directories in case there are questions raised about the provenance of these data.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-274931222, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ AJbT8oG5hzMvSRhYNfnVVWBcCI3N0iFjks5rVmFKgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25# issuecomment-275127619>, or mute the thread https://github.com/ notifications/unsubscribe-auth/AI_9RFYkcrJi1f86ej_CzyFr6gYZJ- gvks5rV2FqgaJpZM4LNYlW.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275153658, or mute the threadhttps://github.com/notifications/unsubscribe- auth/AJbT8swc1P6pVagFFN1so-kMFP92lYclks5rV3YOgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25# issuecomment-275163761>, or mute the thread https://github.com/ notifications/unsubscribe-auth/AI_9RHHZWtiwhEdWCC21otAeZEEaJE6Dk s5rV33qgaJpZM4LNYlW.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275174764, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ AJbT8npR5eMVErOyC3cGQ4hmi6rNFqtjks5rV4aTgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25# issuecomment-275193720>, or mute the thread https://github.com/ notifications/unsubscribe-auth/AI_9RFEkXCyjVVVoZ8pmhf3dNHpFDxU7k s5rV5Z0gaJpZM4LNYlW.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275196729, or mute the thread https://github.com/notifications/unsubscribe-auth/ABmtpsg0OzUy6SH2wtSuIv6fBw_Plfqpks5rV5kUgaJpZM4LNYlW .

johnrpeterson commented 7 years ago

yes, that’s right. but to be clear you wouldn’t want to put all DC1 objects into astrophys.txt. you could (it would still work), but that would put a lot of extra I/O on the system. you’d ideally just include the objects that are relevant for that visit.

john

TomGlanzman commented 7 years ago

Must one specify the fully-qualified path to the included file? Or can it be a relative path and, if so, relative to what: $PWD or the directory containing the including file?

Can included files be compressed or must they be uncompressed text?

Having one include file per visit is straight forward, but given that DC1 will have >1000 visits and each uncompressed file is several GB, this means several TB if we wish to preserve a reference copy for future reference. There will be much redundancy in these files' content. The idea of "slicing up the sky" was intended to ask the question as to whether one can arrange to have some number of non-overlapping object files, some combination of which would be appropriate for a given visit?

On 01/26/2017 06:53 AM, johnrpeterson wrote:

yes, that’s right. but to be clear you wouldn’t want to put all DC1 objects into astrophys.txt. you could (it would still work), but that would put a lot of extra I/O on the system. you’d ideally just include the objects that are relevant for that visit.

john

On Jan 25, 2017, at 2:17 PM, danielsf notifications@github.com<mailto:notifications@github.com> wrote:

I'm still confused about how to use includeobj.

It sounds like I create one text file with all of my astrophysical sources (let's call it astrophys.txt)

and then I create an Instance Catlaog that looks like

rightascension 53.0091385 declination -27.4389488 mjd 59580.1397286 altitude 66.3416951 azimuth 270.2736557 filter 2 vistime 30.0000000 includeobj astrophys.txt

Is this correct (obviously I have truncated the header of this hypothetical InstanceCatalog for the sake of email)?

-- Scott

On Wed, Jan 25, 2017 at 10:48 AM, Tom Glanzman notifications@github.com<mailto:notifications@github.com> wrote:

John,

Yes, sounds good. A question about the path: can phosim automatically look for the 'includeobj' file in the same directory as the instanceCatalog itself? Or must we embed the full path of the included file? If the latter, that means if we move files around, we will be obliged to edit the instanceCatalogs.

Scott, can we adjust the instance catalog generation to create (at least) two files: the main instance catalog with no objects (but with the necessary 'includobj' directives) + one or more files of objects? In this case, the object file could contain a union of all astrophysical objects needed by DC1, and not just those required by a specific visit. That in itself seems like a very desirable simplification.

Thanks John!

  • Tom

On 01/25/2017 10:37 AM, johnrpeterson wrote:

so, the includeobj would just be to have a separate file(s) with all the astrophysics objects. it doesn’t matter performance-wise whether there are 1 or N of these files. but its important that there is one and you put all the objects in that file.

if you do that, then phosim.py will never read that file and therefore will never load the full catalog into memory. then also the trim jobs later will only work on one line at a time. the raytrace jobs will only see a small subset of the full catalog. so that’s why when you say "7 Gbytes of memory", i know that that should never happen to any part of the phosim pipeline.

so you definitely don’t need to slice up the sky. in the past, people have put stars in one file and galaxies in another, and other objects in others, but that’s just what is convenient to the catalog creator.

make sense?

john

On Jan 25, 2017, at 12:29 PM, Tom Glanzman notifications@github.com<mailto:notifications@github.com< mailto:notifications@github.com>> wrote:

John,

SetupVisit is the (arbitrary) name of a workflow step (and is referenced in my update report to the DESC-CI group). This workflow step sets up the necessary phoSim /work and /output directories, discovers the correct instance catalog for this visit, and then decompresses the .txt.gz file (due to https://bitbucket.org/phosim/phosim_release/issues/6/ problem-interpreting-gzipped-instance). Finally phosim.py is called with the '-g condor' option to create the necessary batch files. The high water mark is just what is reported by slurm at NERSC.

For the moment, I am using instance catalogs provided by Scott which do not yet use the includeobj directive. It will take some thought to understand the best way to slice up the sky in such a way that the DC1 instance catalogs can make sensible use of multiple "pieces". As we have found no documentation on the use of this directive, perhaps you have some experience to share that would help?

  • Tom

On 01/25/2017 08:52 AM, johnrpeterson wrote:

tom, what is setupVisit though? there shouldn’t be anything that requires 7 Gbytes, if you use the “includeobj” way of structuring catalogs. remember?

john

On Jan 25, 2017, at 11:19 AM, Tom Glanzman notifications@github.com<mailto:notifications@github.com< mailto:notifications@github.com>mailto:notifications@github.com> wrote:

John,

Agreed that a final review of the command/override file is needed.

Those 14 hours are 'billable cpu hours' from NERSC's perspective. Wall clock elapsed time for the entire sequence was about 2.4 hours. Note that the initial run of phosim (setupVisit step), which has a memory high-water mark of nearly 7 GB requires 3 cores just for the memory, not for the CPU power, so while we were billed for 25min, the elapsed time for that step was only ~8 minutes and only one of the three CPUs was used.

  • Tom

On 01/25/2017 06:51 AM, johnrpeterson wrote:

Tom, we should review the physics override file. I don’t really see why you should be using those settings now that we are on to phosim v3.6.

also, the CPU time you have a 14 hours includes the 8 cores, so its really ~14/8 of wall time?

john

On Jan 24, 2017, at 3:38 PM, Tom Glanzman notifications@github.com<mailto:notifications@github.com< mailto:notifications@github.com>mailto:notifications@github.com<mailto: notifications@github.commailto:notifications@github.com>> wrote:

A brief status reporthttps://docs.google.com/presentation/d/ 1JM2X100AMalC4qq41Unm320n4HpG2nWBaIeWcCSYICE/edit?usp=sharing was presented yesterday at the DESC-CI meeting.

----o----

Three preliminary full-focal plane visits have been completed at NERSC. The data are located here:

/global/projecta/projectdirs/lsst/production/DC1/DC1- phoSim-1/20170124-test/

000000 - uses an early instance catalog from Scott Daniels (nsnap 2, filter 4)

000001 - uses an instance catalog from Jim, used in his IMSIM development (nsnap 1, filter 2)

000002 - uses a new instance catalog from Scott (nsnap 1, filter 2)

Output consists of two data products per sensor: the 'electron' file (fits) and centroid file (txt) and is located in the 'output' subdirectories.

For these three test runs, I have retained both the full instance catalogs and the complete phosim /work directories in case there are questions raised about the provenance of these data.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/ LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-274931222, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ AJbT8oG5hzMvSRhYNfnVVWBcCI3N0iFjks5rVmFKgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25# issuecomment-275127619>, or mute the thread https://github.com/ notifications/unsubscribe-auth/AI_9RFYkcrJi1f86ej_CzyFr6gYZJ- gvks5rV2FqgaJpZM4LNYlW.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275153658, or mute the threadhttps://github.com/notifications/unsubscribe- auth/AJbT8swc1P6pVagFFN1so-kMFP92lYclks5rV3YOgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25# issuecomment-275163761>, or mute the thread https://github.com/ notifications/unsubscribe-auth/AI_9RHHZWtiwhEdWCC21otAeZEEaJE6Dk s5rV33qgaJpZM4LNYlW.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275174764, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ AJbT8npR5eMVErOyC3cGQ4hmi6rNFqtjks5rV4aTgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub < https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25# issuecomment-275193720>, or mute the thread https://github.com/ notifications/unsubscribe-auth/AI_9RFEkXCyjVVVoZ8pmhf3dNHpFDxU7k s5rV5Z0gaJpZM4LNYlW.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275196729, or mute the thread https://github.com/notifications/unsubscribe-auth/ABmtpsg0OzUy6SH2wtSuIv6fBw_Plfqpks5rV5kUgaJpZM4LNYlW .

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275204554, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8lgn-9sC4QFVZVAZhlgjB6wY_arAks5rV5-ygaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/LSSTDESC/SSim_DC1_Roadmap/issues/25#issuecomment-275407535, or mute the thread https://github.com/notifications/unsubscribe-auth/AI_9REGo-iur35M-2Xz1Y3z-E9_7FrSpks5rWLN3gaJpZM4LNYlW.

johnrpeterson commented 7 years ago

On Jan 26, 2017, at 11:36 AM, Tom Glanzman notifications@github.com<mailto:notifications@github.com> wrote:

Must one specify the fully-qualified path to the included file? Or can it be a relative path and, if so, relative to what: $PWD or the directory containing the including file?

its a relative path to where the catalog file is.

Can included files be compressed or must they be uncompressed text?

they can be compressed

Having one include file per visit is straight forward, but given that DC1 will have >1000 visits and each uncompressed file is several GB, this means several TB if we wish to preserve a reference copy for future reference. There will be much redundancy in these files' content. The idea of "slicing up the sky" was intended to ask the question as to whether one can arrange to have some number of non-overlapping object files, some combination of which would be appropriate for a given visit?

so phosim itself wouldn’t mind if you include lots of sources that don’t even make it on the image. i was just saying you shouldn’t really send it too many extraneous sources per visit to keep the I/O load down.

but your question about slicing the sky is interesting. i have argued in the past that you could do this for the galaxies. since they don’t move and they don’t change their brightness. since these are ~90%+ of the sources, you could imagine having just a small number of catalog files containing just galaxies. so in the case of DC1 i think there are 7? field points but 1000 visits, so you might be able to get away with just 7 of these catalogs for the whole run. even if there is dithering it is ok, because you can just include a larger subset. and then for the stars (which vary and move slightly), the asteroids, agn, etc. you’d have a custom includeobj file for each of the 1000 visits, but these would be rather small files. so then you’d have at least 2 includeobj files per visit: one that is more generic containing galaxies for that general place on the sky and one that is customized for that visit for the non-galaxy objects.

but this is not really a phosim question. its up to the catalog creator. i’d be happy to help with this though.

john

danielsf commented 7 years ago

@johnrpeterson @TomGlanzman

I have just discovered/remembered that using includeobj is not going to help us. Even though galaxies do not move, their positions still need to be corrected for the precession and nutation of the Earth in order to get them into the above the atmosphere geocentric apparent RA, Dec coordinates expected by PhoSim. This is a time-dependent correction and will thus be different for every individual InstanceCatalog.

That being said: one of the major drivers for including that correction properly is to make sure that the (RA, Dec) and (Alt, Az) coordinates in the InstaneCatalog are self-consistent. Since PhoSim actually doesn't check the self-consistency of the (RA, Dec) and (Alt, Az) coordinates, we could reconfigure CatSim to produce InstanceCatalogs in which the astrophysical objects are in ICRS (RA, Dec) coordinates with parallax and proper motions applied to the stars, but the (Alt, Az) header values are still corrected for precession and nutation.

John had said at some point in the recent past that PhoSim was thinking about including a step to validate the (RA, Dec) and (Alt, Az) coordinates in an InstanceCatalog. The scheme I proposed above would fail that test. My questions for John is: is that test going to be something that we can optionally turn off?

I'm not saying that I am sure that I want to reconfigured CatSim as I have described. This post is just an attempt to feel out what the consequences of doing so would be.

cwwalter commented 7 years ago

. Since PhoSim actually doesn't check the self-consistency of the (RA, Dec) and (Alt, Az) coordinates, we could reconfigure CatSim to produce InstanceCatalogs in which the astrophysical objects are in ICRS (RA, Dec) coordinates with parallax and proper motions applied to the stars, but the (Alt, Az) header values are still corrected for precession and nutation.

I don't think I would be for that as a solution. I think it would be something that people would be constantly trying to remember and understand and would lead to future mistakes and confusion.

johnrpeterson commented 7 years ago

On Jan 26, 2017, at 3:00 PM, danielsf notifications@github.com<mailto:notifications@github.com> wrote:

@johnrpetersonhttps://github.com/johnrpeterson @TomGlanzmanhttps://github.com/TomGlanzman

I have just discovered/remembered that using includeobj is not going to help us. Even though galaxies do not move, their positions still need to be corrected for the precession and nutation of the Earth in order to get them into the above the atmosphere geocentric apparent RA, Dec coordinates expected by PhoSim. This is a time-dependent correction and will thus be different for every individual InstanceCatalog.

[yes, but we could either ignore precession/nutation or do it in phosim, if it meant saving petabytes of catalogs.]

but this is the second issue. i really want to stress the original one. even if you do a full list of objects that is unique to each catalog, you should still include that as a separate file and reference that with includeobj. that way phosim will not parse it until the appropriate time, and therefore you won’t have the 7 Gbyte memory problem.

That being said: one of the major drivers for including that correction properly is to make sure that the (RA, Dec) and (Alt, Az) coordinates in the InstaneCatalog are self-consistent. Since PhoSim actually doesn't check the self-consistency of the (RA, Dec) and (Alt, Az) coordinates, we could reconfigure CatSim to produce InstanceCatalogs in which the astrophysical objects are in ICRS (RA, Dec) coordinates with parallax and proper motions applied to the stars, but the (Alt, Az) header values are still corrected for precession and nutation.

John had said at some point in the recent past that PhoSim was thinking about including a step to validate the (RA, Dec) and (Alt, Az) coordinates in an InstanceCatalog. The scheme I proposed above would fail that test. My questions for John is: is that test going to be something that we can optionally turn off?

that’s not there now, but for sure we can turn it off if its there. there are incredibly common use cases where they are deliberately inconsistent (e.g. some stars at (0,0) but at an arbitrary altitude)

I'm not saying that I am sure that I want to reconfigured CatSim as I have described. This post is just an attempt to feel out what the consequences of doing so would be.

understood. but daniel definitely deal with the first issue for now. that is really easy.

john

cwwalter commented 7 years ago

Tom, we should review the physics override file. I don’t really see why you should be using those settings now that we are on to phosim v3.6.

We should discuss this... But I'm afraid of mixing up this issue with the other's in this already complicated thread. As John said somethings (like clouds) he has now checked and could be included. For other DM reasons, we have also decided for DC1 we want to turn off BF and treerings etc. @TomGlanzman could you make another issue for this?

BTW @johnrpeterson, there is something odd with your setup now and we can't distinguish between what you are quoting and your responses. It makes it almost impossible to follow.... Could you maybe use the GH interface or check your email settings? I am using GH for what it is worth.

danielsf commented 7 years ago

I will provide Tom with a script that stores all of the sources in a separate file that is linked to the InstanceCatalog with includeobj

(the more I think about my extreme solution, the more I doubt it will work; as I recall, the effectsof precession and nutation are not necessarily uniform across the focal plane)

cwwalter commented 7 years ago

I will provide Tom with a script that stores all of the sources in a separate file that is linked to the InstanceCatalog with includeobj

You mean per instance catalog here (so that nutation/precession are included) ?

danielsf commented 7 years ago

Yes. Each InstanceCatalog will have its own includeobj file

SimonKrughoff commented 7 years ago

(the more I think about my extreme solution, the more I doubt it will work; as I recall, the effectsof precession and nutation are not necessarily uniform across the focal plane)

I worry about this too. It's why we've always just generated each instance catalog independently of all the others.

What is the overall size of all the instance catalogs with and without stars (sorry if this is in the thread already. I looked but didn't see it)?

SimonKrughoff commented 7 years ago

Here's an intermediate proposal. Could we keep one galaxy catalog around and generate the shifted positions in the workflow system at run time with an afterburner script?

jchiang87 commented 7 years ago

Stars are a relatively insignificant fraction of these instance catalogs. The ones I generated for DC1 are about 3GB each.

TomGlanzman commented 7 years ago

Please see phosim configuration discussion in issue #27

TomGlanzman commented 7 years ago

phoSim config, issue #27, resolved and closed. Task DC1-phoSim-2 is being used to create a number of trial full focal plane images. Ran into an issue this afternoon of overloading the fatboy server at UW generating instance catalogs with only five concurrent jobs. With suggestions from Scott, retrying with a smaller "CHUNK_SIZE" (100,000 -> 10,000). Don't really have a good way to serialize these jobs in the NERSC system, so this problem may come back again.

With some luck, this could be the start of production.

TomGlanzman commented 7 years ago

Seven visits are complete, with another 23 in queue. Learning how to navigate the NERSC system is part of this challenge. Mustafa has been a big help to pilot the pilots. One can view progress at this url.

sethdigel commented 7 years ago

Looking at the first couple of log files for the RunRaytrace step, I'm seeing only background sources (Airglow, Zodiacal Light, Cosmic Rays). See the end of the log file here. The corresponding FITS file (/global/projecta/projectdirs/lsst/production/DC1/DC1-phoSim-2/output/000000/lsst_e_40336_f2_R01_S01_E000.fits.gz) has only background, and the matching centroid file in the same directory lists only background sources (id = 0.000000).

Here is an image from ds9 for lsst_e_40336_f2_R01_S01_E000.fits.gz.

lsst_e_40336_f2_r01_s01_e000

I looked at the output image for a second CCD from this visit (lsst_e_40336_f2_R03_S11_E000.fits.gz) and it is also only background.

So I'm afraid that something is wrong. Maybe the instance catalog does not have the star and galaxy definitions, or maybe there's a mismatch in region of the sky covered by the instance catalog vs. that covered by the CCD.

I looked at randomly chosen centroid files for the next four visits (40337, 40338, 40345, 40366), and each of them had only background sources.

For what it is worth, here is a log file from a Twinkles Run 1 visit (phoSim 3.4 era). It lists a 'Dark Sky' background source that is not present in the DC1 log file linked above. On the other hand, old log file does not have Airglow or Zodiacal Light background sources.

dkirkby commented 7 years ago

The gradient in this image is a bit surprising, so I checked a different chip (lsst_e_40336_f2_R01_S01_E000) and it was spatially uniform with a mean ~225 adu. Any idea what is causing the gradient? It could be vignetting, but is that being simulated and is this chip at the edge of the focal plane?

Has anyone done a sanity check on the cosmics rate in these images? The two chips I looked at had three each. Does that make sense for this sensor thickness and exposure time? (It seems low to me, but I'm not well calibrated for such short exposure times).

danielsf commented 7 years ago

@TomGlanzman Do the InstanceCatalogs and includeobj files for any of these images still exist anywhere? I would like to look at them to make sure the objects align with the field of view as per Seth's concern.

TomGlanzman commented 7 years ago

Hi @danielsf, yes, indeed, all of the intermediate files are still in place. The kit-and-kaboodle can be found (at NERSC) here: /global/cscratch1/sd/desc/Pipeline-tasks/DC1-phoSim-2. In this directory, you will find one subdirectory for each visit indexed by task stream, e.g., 000000 through 000029. The order of visits is the same as in Humna's pickle file (which is also where I extract the list of sensors to simulate for the visit).

For each visit directory, you will find four items:

commands.txt - the override/command file used by phoSim instCat/ - the four (4) instanceCatalog files produced by your generator script work/ - the phoSim work directory, complete with all intermediate files output/ - empty (by design, real output is transferred to the project directory)

Please have a look.

danielsf commented 7 years ago

Comparing phosim_40336.txt (the master InstanceCatalog) and star_cat_40336.txt (the catalog of stars): the stars are centered on the nominal PhoSim field of view.

One question for @johnrpeterson: we have written the includeobj commands with the path relative to the master catalog. Should it be relative to the executable?

I.e. if phosim_master.txt and stars.txt are both in catalogs/, and I run

phosim.py catalogs/phosim_master.txt

should the includeobj command be

includeobj stars.txt

or

includeobj catalogs/stars.txt

?

We are currently doing the former.

cwwalter commented 7 years ago

John will correct me if I am wrong but in my experience everything you reference must be relative to the directory the executable is executing in or absolute. So I would have guessed you needed to do it the 2nd way.

danielsf commented 7 years ago

In principle, I agree, but earlier in this thread

https://github.com/LSSTDESC/SSim_DC1/issues/25#issuecomment-275447184

he said the path was "relative to where the catalog file is"

cwwalter commented 7 years ago

Ah. I see... OK so it depends on where John assumes the catalog is in the same directory the executable is running.

danielsf commented 7 years ago

I just ran a test on my laptop with a small catalog. I used includeobj the way we have been (the first in my post above). It worked. PhoSim found the sources and put them on the chip. I, however, just ran phosim.py. I know that our NERSC workflow does not directly call phosim.py, but breaks up the different tasks therein by hand. I wonder if that is where the bug entered.

TomGlanzman commented 7 years ago

@danielsf could we come up with a specific combination of phoSim version, command file and catalog with which to demonstrate this problem? Could one of the existing NERSC jobs be used as-is? One problem with running phosim.py solo is that is cleans up all intermediate files making it a bit difficult to compare step-by-step. I also think we should ask @johnrpeterson for some advice on how to isolate the problem.

sethdigel commented 7 years ago

Here's a crudely assembled focal plane image for visit 40337, which has the most CCDs (146) of any of the visits completed so far in Tom's DC1 run. I think that all of the individual CCD images that will be calculated for this visit have been completed; at least I don't see any more running.

assemble_40337

I'd guess that the structure in the image is simulated airglow. It is not zodiacal light. There could be some vignetting as well.

The hole in the image is where the R31_S11 image would be. I've double checked that the output directory does not contain an e image for that location. I have not yet checked whether it was part of the simulation run.

To assemble the image, I copied scaled-down versions of the individual e images into their correct relative locations in the focal plane based on the R##_S## parts of the file names. The spacing between the CCDs and rafts in this assembled image is not perfectly accurate - the actual gaps are somewhat larger - but it does not matter for this purpose. The overall image size is 5 x 5 rafts. The corner rafts are of course missing from the simulation.

dkirkby commented 7 years ago

The left-most column shows the pattern I would expect from vignetting, but its on the wrong side of the chips! Might these need flipping horizontally? However, I don't see similar vignetting patterns on the other edges.

The spatial structure here isn't what I would expect for airglow. What filter is this?

sethdigel commented 7 years ago

Yes the chips could need flipping. I also should have reversed the placement of the individual CCDs left-right. I put them in 'as seen through L3' order, looking down on the focal plane.

It is r-band (f2 in the file names)

danielsf commented 7 years ago

@TomGlanzman Is there a script I could run on my laptop that reproduces the way our NERSC workflow runs PhoSim (something akin to Glenn's cluster_submit.py)? I would like to try to recreate the bug locally so we can get some idea what we are reporting.

johnrpeterson commented 7 years ago

yes, the outer stuff is definitely vignetting. i also think if you flip the chips the airglow variation will have a more coherent looking pattern.

john

On Feb 6, 2017, at 11:24 AM, danielsf notifications@github.com<mailto:notifications@github.com> wrote:

@TomGlanzmanhttps://github.com/TomGlanzman Is there a script I could run on my laptop that reproduces the way our NERSC workflow runs PhoSim (something akin to Glenn's cluster_submit.py)? I would like to try to recreate the bug locally so we can get some idea what we are reporting.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1/issues/25#issuecomment-277733321, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8kAQJ83XC7pOhoCQXcr4qJ8R-YOkks5rZ0klgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

sethdigel commented 7 years ago

I updated the assembled image above. What it needed was reversing the order of the CCDs horizontally but not reversing the individual CCD images. Again, the individual images are not positioned exactly as they would be in the actual focal plane. I scaled the e images to 500 x 509 and placed them into 510 x 510 slots according to the R## and S## parts of the file names. When cori is available again I'd be interested to check on the missing CCD image.

cwwalter commented 7 years ago

Where do we stand on all of the sources being missing?

danielsf commented 7 years ago

@cwwalter I think that the problem is in how we have decided to "manually" run the tasks that are normally run by phosim.py. If it at all makes sense to run the workflow scripts on a local machine, I would recommend that as the way to debug things. Not knowing anything about what goes on under the hood of the workflow, I cannot say more.

johnrpeterson commented 7 years ago

daniel, please use the cluster_submit.py script. you shouldn’t have to worry about these details. there is no way it will be effective on HPC without using this script.

john

On Feb 6, 2017, at 1:57 PM, danielsf notifications@github.com<mailto:notifications@github.com> wrote:

@cwwalterhttps://github.com/cwwalter I think that the problem is in how we have decided to "manually" run the tasks that are normally run by phosim.py. If it at all makes sense to run the workflow scripts on a local machine, I would recommend that as the way to debug things. Not knowing anything about what goes on under the hood of the workflow, I cannot say more.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1/issues/25#issuecomment-277777366, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8jJFGkDii8M3oyw6d-xvkNoqRbUEks5rZ20MgaJpZM4LNYlW.

dkirkby commented 7 years ago

Where do we stand on all of the sources being missing?

Who needs sources when you can learn so much from the background? 😄

dkirkby commented 7 years ago

@sethdigel This updated image looks a lot better and the vignetting makes sense now, but I don't understand why the lower chips have relatively little variance. The amplitude of the airglow variation seems a bit high for r-band. Can you easily simulate a few chips of the same exposure in a redder band? (where the airglow amplitude should be larger, with similar spatial structure).

TomGlanzman commented 7 years ago

The past weekend was fruitful in several ways. An attempt was made to run phoSim on the first 30 visits in Humna's pickle file list. This represented 1717 sensor-sims. Of this total, 1505 completed while the remaining 212 were hit with the 2-day maintenance outage. At one point, there were ~25 pilots jobs and nearly 500 sims running concurrently. At this level, there were no noticeable infrastructure issues that I am aware of. A few cori/slurm-related operational lessons were learned.

  1. avoid setting the job's time limit to the max for the queue (partition) as the dispatch priority will be very disadvantageous

  2. shorter jobs, even with multiple nodes requested, enjoy a much faster dispatch (or at least this was true over the past weekend -- load characteristics may vary)

Therefore, we must find the best compromise between efficiency of job management and queue limits.

Given that when a pilot job dies, it takes all currently running phosim instances with it, so it is also clear that checkpointing regains its importance and priority in the to-do list. This will require some effort from the Tony/Brian team.

[Apologies for the delay in getting this out -- I will be out of town for a week starting tomorrow and have many meetings today. There will be follow up messages addressing the "missing sources" issue.]

sethdigel commented 7 years ago

@sethdigel This updated image looks a lot better and the vignetting makes sense now, but I don't understand why the lower chips have relatively little variance. The amplitude of the airglow variation seems a bit high for r-band. Can you easily simulate a few chips of the same exposure in a redder band? (where the airglow amplitude should be larger, with similar spatial structure).

The short answer is no, not right now. I was just reading results from one of the DC1 visits that Tom ran (and all of the DC1 visits are r band). In principle I can set up separate individual phoSim runs with just the Airglow and changing the filter, but I will not be able to get to that right away.

Regarding backgrounds, I see in the [minion_1016 metadata](here for the DC1 visits that the first 7 visits have moonAlt < 0, but the next 23 have the Moon 23-30 deg up and at about half phase, so it will be interesting to see the effect on sky brightness and the background. (The metadata file linked here does not have the dithered rotSkyPos.)

TomGlanzman commented 7 years ago

@sethdigel noticed the lack of non-background sky sources in the phoSim output. My hope is that this is due to some (simple) config problem in the scripts that are running the various pieces of phoSim. Part of the investigation is to examine the various intermediate files produced by phosim, such as the trimmed catalog files. The 2-day NERSC outage is hampering my efforts to track this down so forensics must wait. In the meantime, let me address two issues raised in previous posts: 1) is cluster_submit being used; and, 2) looking at the actual workflow scripts.

To the first, no, cluster_submit is not being used. It does not lend itself to integration with the workflow engine, and batch parameters/resources are hard-wired. The significant value of the cluster_submit script is that it provided an example and template for deconstructing the 'phosim.py' into separate pieces (setup, trim, raytrace, e2adc), and translating condor submit files into slurm submit files. All of that knowledge has been incorporated into the workflow scripts. I know that Glenn is working to put checkpointing into cluster_submit, but it is not there in the version we have. Checkpoint integration needs care but is not difficult and I am prepared to do that once the workflow engine provides support for the concept. I would be happy to discuss the needs of the DC1 workflow with the phoSim folks to see if there might be a common solution.

To the second issue, all of the scripts used to run the DC1 task are in a github repo -- look into the NERSC directory. For the record (and interested readers) bit more detail follows as to what these scripts are doing.

STEP 1 (setupVisit.py):

Determine visit parameters (obsHistID and list of sensors to simulate)

Generate phoSim instance catalog

$ \/phosim.py -g condor -o \ -w \ -c \ --checkpoint=0 -e 0 -t \

This creates many files in the 'work' directory, then exits.

STEP 2 (runTrim.py):

trim\\.submit/.pars

$ cd \ $ \ \< \

STEP 3 (runRaytrace.py):

raytrace\\_E\_\.submit/.pars

$ cd \ $ \ \< \

(There is no E2ADC step for DC1.)

=================================================

johnrpeterson commented 7 years ago

Tom, please use cluster_submit.py. Its unlikely that you’ll get all the details correct this way. You can basically get the slurm commands out of it, and then bundle them still (in other words, have the cluster_submit not actually submit the jobs, but tell the workflow engine what to submit). This is not an abstract argument, as En-Hsin already ran the flats this way, which is as large as a data challenge.

john

On Feb 6, 2017, at 6:17 PM, Tom Glanzman notifications@github.com<mailto:notifications@github.com> wrote:

@sethdigelhttps://github.com/sethdigel noticed the lack of non-background sky sources in the phoSim output. My hope is that this is due to some (simple) config problem in the scripts that are running the various pieces of phoSim. Part of the investigation is to examine the various intermediate files produced by phosim, such as the trimmed catalog files. The 2-day NERSC outage is hampering my efforts to track this down so forensics must wait. In the meantime, let me address two issues raised in previous posts: 1) is cluster_submit being used; and, 2) looking at the actual workflow scripts.

To the first, no, cluster_submit is not being used. It does not lend itself to integration with the workflow engine, and batch parameters/resources are hard-wired. The significant value of the cluster_submit script is that it provided an example and template for deconstructing the 'phosim.py' into separate pieces (setup, trim, raytrace, e2adc), and translating condor submit files into slurm submit files. All of that knowledge has been incorporated into the workflow scripts. I know that Glenn is working to put checkpointing into cluster_submit, but it is not there in the version we have. Checkpoint integration needs care but is not difficult and I am prepared to do that once the workflow engine provides support for the concept. I would be happy to discuss the needs of the DC1 workflow with the phoSim folks to see if there might be a common solution.

To the second issue, all of the scripts used to run the DC1 task are in a github repohttps://github.com/TomGlanzman/DC1-phoSim-2 -- look into the NERSC directory. For the record (and interested readers) bit more detail follows as to what these scripts are doing.

STEP 1 (setupVisit.py):

Determine visit parameters (obsHistID and list of sensors to simulate)

Generate phoSim instance catalog

$ /phosim.py -g condor -o -w -c --checkpoint=0 -e 0 -t

This creates many files in the 'work' directory, then exits.

STEP 2 (runTrim.py):

trim__.submit/.pars

$ cd $ <

STEP 3 (runRaytrace.py):

raytrace__E.submit/.pars

$ cd $ <

(There is no E2ADC step for DC1.)

=================================================

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/LSSTDESC/SSim_DC1/issues/25#issuecomment-277846244, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJbT8lEbQUKB-TWlkS-M2JbO0-CI0uJsks5rZ6oSgaJpZM4LNYlW.

———————————

John R. Peterson Assoc. Professor of Physics and Astronomy Department of Physics and Astronomy Purdue University 525 Northwestern Ave. West Lafayette, IN 47906 (765) 494-5193

SimonKrughoff commented 7 years ago

This is not an abstract argument, as En-Hsin already ran the flats this way, which is as large as a data challenge.

@johnrpeterson when are these going to show up someplace? I don't see them either on globus or at NCSA.