LSSTDESC / Twinkles

10 years. 6 filters. 1 tiny patch of sky. Thousands of time-variable cosmological distance probes.
MIT License
13 stars 12 forks source link

Keeping the NERSC DMstack install up to date #222

Open heather999 opened 8 years ago

heather999 commented 8 years ago

Now that we have a DMstack installation at NERSC using the conda install from the dev channel - I'm wondering how to keep it up to date.

My intent would be to put fresh installations of the stack in parallel to whatever versions are already available, so nothing should be updated or removed out from under anyone (though we likely will want/need to clean up unused/old installations from time to time..) Any suggestions @jchiang87 @drphilmarshall @danielsf ?

tonyj321 commented 8 years ago

This is probably not 100% applicable to the question, but I thought it might be useful to document how we deal with code installs for EXO, since I think it has worked well, and at least some of this functionality might be applicable to DMstack installs for DESC.

For EXO we use Jenkins to make automatic builds of all of our code. One nice feature of Jenkins is that you can have one central install of the service, with distributed agents that build for different OS/Site combinations. For EXO we run agents at SLAC, SRCF, NERSC and WIPP, so our code builds are always available at each of these sites.

More specifically we run the code build whenever new code is committed (we use subversion but this would work well for github too). Whenever the code build succeeds we make the build available to users at each site. We organize the builds as:

builds//build-id/nnn/stuff svn-id/nnn (links to stuff) tag-id/tag (links to stuff) trunk (links to stuff) current-release (links to tag)

This allows the user to select to use a specific build, a specific subversion-id (could be git hash), a specific tag, or the most recent release, or the most recent build. There are many duplicate files between the different releases, so we use a tool called trim-trees (perl) to replace duplicate files with hard links which results in a huge savings in disk space (and works because a build once made is never updated). As a result of the disk space saving we have been able to keep all builds since we started data taking (something like 5 years of builds right now).

This system runs completely automatically and rarely requires any user intervention. Most users use one of these pre-built code releases rather than building code themselves. I think it would be useful to see which of these features could be usefully adapted for use by DESC.

rhiannonlynne commented 8 years ago

FYI, the conda dev channel is currently (and until we are satisfied that the version resolution would occur fast enough, it will stay this way) only providing the most up-to-date/most recent version of the sims release, so if you are using a conda install into the same conda environment, I'm not sure how you could keep multiple versions 'alive'. If you use multiple conda environments, you should be able to install a new version of the stack into a new environment, while keeping the old version available (in the old environment).

If you are using an eups build, you could do something like suggested by Tony (actually an "eups distrib" build would do that automatically). There are also other tools available - lsstsw (https://github.com/lsst/lsstsw) and the SQuaRE team does run an hourly jenkins build, but I don't think the results are distributed. In any of these cases, you do have to build from source - the conda distribution installation is from binaries.

Lynne

On Tue, Apr 19, 2016 at 10:42 AM tonyj321 notifications@github.com wrote:

This is probably not 100% applicable to the question, but I thought it might be useful to document how we deal with code installs for EXO, since I think it has worked well, and at least some of this functionality might be applicable to DMstack installs for DESC.

For EXO we use Jenkins to make automatic builds of all of our code. One nice feature of Jenkins is that you can have one central install of the service, with distributed agents that build for different OS/Site combinations. For EXO we run agents at SLAC, SRCF, NERSC and WIPP, so our code builds are always available at each of these sites.

More specifically we run the code build whenever new code is committed (we use subversion but this would work well for github too). Whenever the code build succeeds we make the build available to users at each site. We organize the builds as:

builds//build-id/nnn/stuff svn-id/nnn (links to stuff) tag-id/tag (links to stuff) trunk (links to stuff) current-release (links to tag)

This allows the user to select to use a specific build, a specific subversion-id (could be git hash), a specific tag, or the most recent release, or the most recent build. There are many duplicate files between the different releases, so we use a tool called trim-trees (perl) to replace duplicate files with hard links which results in a huge savings in disk space (and works because a build once made is never updated). As a result of the disk space saving we have been able to keep all builds since we started data taking (something like 5 years of builds right now).

This system runs completely automatically and rarely requires any user intervention. Most users use one of these pre-built code releases rather than building code themselves. I think it would be useful to see which of these features could be usefully adapted for use by DESC.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/222#issuecomment-212036881

heather999 commented 8 years ago

@tonyj321 short answer - yes :) At the moment, I haven't thought much beyond just getting DMstack installed - where right now, we are strictly using the conda-based installation due to the rapid development occurring and the difficulties encountered doing an "eups distrib" at NERSC. I do want to follow-up with DM about that, since I was assuming "eups distrib" would work just as well at NERSC as it has at SLAC, but there were some problems concerning mariadb and its dependencies (these might be easy to work-around). As for Jenkins builds in general, I'm wondering how we can settle on a set of package versions to use and call that a "release" or at least something worth installing at NERSC for general use? Is it possible to see the EXO installation structure at NERSC?

Thank you, @rhiannonlynne, for now we are installing both lsst_apps and lsst_sims via the conda dev channel. @danielsf made a more recent release of the lsst_apps binaries available which are compatible with lsst_sims. We have installed both easily at NERSC in one conda environment. I fear there are already updates required to some packages, such as obs_lsstSims, which I'm hoping will be made available in an upcoming conda dev release. That is part of the thrust of this issue - how to know if there's a need to grab a more recent installation of the stack for general shared use at a place like NERSC? For now, I'm assuming these installs will live in separate conda environments - I really don't want to be updating things out from under anyone.

rhiannonlynne commented 8 years ago

It's a bit tricky with the sims updates, as we do depend on various parts of lsst_apps (afw, obs_lsstSims, etc.). We do try to stick to a particular weekly build of lsst_apps until we have particular reasons to update though, as we don't want to make our users who are installing via source do full rebuilds of afw unless they need to. For this reason, we generally don't build from master but we can build from git tags. Our conda releases match the eups releases, so they will have the same set of lsst_apps packages as the eups build. So - for example, we picked up the recent weekly DM build, w_2016_15, because it incorporated fixes to afw which let users on el capitan + xcode 7.3 build from source. We probably won't pick up every weekly unless there is another important feature.

Because it can be hard to track all of the available changes, if there is an update which is important to you, it would be helpful to get some feedback that this a feature you'd like included in a future release. For now, an email to lsst-imsim@lsstcorp.org is probably a good way to request this.

We have set up a sims-announcements subcategory on community.lsst.org and we will announce new sims releases (including new conda releases) there, which could be a good way to get notifications of updates. ( https://community.lsst.org/t/announcing-the-sims-announcements-subcategory/693 )

(ps - the mariadb dependencies, in particular ssl, have been a bit tricky, but I think there have been some updates to attempt to address these .. might be worth giving the eups build another try, if that's something that would be useful).

On Tue, Apr 19, 2016 at 12:25 PM Heather Kelly notifications@github.com wrote:

@tonyj321 https://github.com/tonyj321 short answer - yes :) At the moment, I haven't thought much beyond just getting DMstack installed - where right now, we are strictly using the conda-based installation due to the rapid development occurring and the difficulties encountered doing an "eups distrib" at NERSC. I do want to follow-up with DM about that, since I was assuming "eups distrib" would work just as well at NERSC as it has at SLAC, but there were some problems concerning mariadb and its dependencies (these might be easy to work-around). As for Jenkins builds in general, I'm wondering how we can settle on a set of package versions to use and call that a "release" or at least something worth installing at NERSC for general use? Is it possible to see the EXO installation structure at NERSC?

Thank you, @rhiannonlynne https://github.com/rhiannonlynne, for now we are installing both lsst_apps and lsst_sims via the conda dev channel. @danielsf https://github.com/danielsf made a more recent release of the lsst_apps binaries available which are compatible with lsst_sims. We have installed both easily at NERSC in one conda environment. I fear there are already updates required to some packages, such as obs_lsstSims, which I'm hoping will be made available in an upcoming conda dev release. That is part of the thrust of this issue - how to know if there's a need to grab a more recent installation of the stack for general shared use at a place like NERSC? For now, I'm assuming these installs will live in separate conda environments - I really don't want to be updating things out from under anyone.

— You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub https://github.com/DarkEnergyScienceCollaboration/Twinkles/issues/222#issuecomment-212077971