LSSTDESC / td_env

Sets up td_env at NERSC and deploys docker images to dockerhub
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Packages Remy requested #16

Closed heather999 closed 2 years ago

heather999 commented 2 years ago

Some notes: galsim and pandas are already included in the LSST Sci Pipelines which is the base of our environment, so nothing to do there. GCRCatalogs is available on conda-forge and will be conda installed.

lenstronomy's MPI support requires a non-pip or conda installable version of schwimmbad see this and this LSST Sci Pipelines installs the conda-forge released schwimmbad - so this may be a problem.

herjy commented 2 years ago

Also tagging @sibirrer To see what he thinks of that. We might not need MPI for the sims provided that the time cost overhead is manageable, but it will surely be needed further down the line in the analysis. Though this would be an analysis choice, which we shouldn't have to worry about.

heather999 commented 2 years ago

It could be that we need to set up a separate environment without the LSST Sci Pipelines (if that satisfies your needs). I should also reach out to Rubin DM, my understanding is that they recently added schwimmbad to their environment and perhaps it is ok to modify the version of this particular package in our installation.

sibirrer commented 2 years ago

Thanks for the pointer @herjy ! I have already a lenstronomy version on the main branch that moved to the latest pip version of schwimmbad. I will make a new release on pypi tomorrow and hope that will solve (part of) it

herjy commented 2 years ago

@heather999 i actually need the science pipeline for source injection.

heather999 commented 2 years ago

Ok - understood @herjy ! Having a released version of schwimmbad on pypi would help. Thank you @sibirrer ! I can then see if there's any issue with updating the version in the science pipeline env - I'm hopeful it will be fine.

sibirrer commented 2 years ago

@heather999 @herjy I just released a new lenstronomy version on PyPi v1.10.0 Can you check whether this remains an issue with lenstronomy? Thanks

heather999 commented 2 years ago

@sibirrer This looks good thank you!
I do have a request and wondering what the best way to ask or if I should open an issue or potentially a PR on your repo. I noticed that the actual pip install lenstronomy doesn't pull in all those dependencies and it seems to be by design. I used the requirements.txt provided to complete the install and ran py.test and it worked! I'm wondering if it's possible to add extras_require to the setup.py in lenstronomy so that doing pip install lenstronomy[extrastuff] will install those additional optional dependencies (see here)? It's just a one-liner in setup.py and then the installation on our end won't break if you change the requirements in the future.

sibirrer commented 2 years ago

Hi @heather999, what is the specific line and text to be added? If you give me the specific line that would be great. PR's are always welcome as well of course :) The main purpose was to keep lenstronomy more lightweight but since all the dependencies are pip-installable, it's not anymore an issue.

sibirrer commented 2 years ago

@heather999 , I hope I made the changes to the lenstronomy setup.py and made a new release 1.10.1. The PR with the changes in lenstronomy are here. Let me know if there are remaining issues and feel free to make a PR if there is something that could be fixed easily.

heather999 commented 2 years ago

Thanks @sibirrer Looks great. One small thing I'll PR in a moment is to move from sklearn=>scikit-learn since that is the name conda-forge recognizes and I think it's now the more accepted version of the package name.

sibirrer commented 2 years ago

@heather999 new release with scikit-learn dependency changed in lenstronomy available. Thanks a lot!

heather999 commented 2 years ago

@herjy I have a dev environment set up at NERSC that includes the packages you requested, on top of the LSST Sci Pipelines, as well as the requested SN packages. If you have a chance - can you give it a try? To set up the environment, log onto Cori and then do: source /global/cfs/cdirs/lsst/groups/TD/setup_td_dev.sh The default is to completely purge the modules in the default Cori environment - as requested by Rick K and Rob K. You can turn off that behavior by adding a -k for "keep", which will avoid completely purging your loaded modules, but will still modify your module list to load what is needed, such as cray-mpich for the conda environment. I'll be sending this around to the TD Slack channels to get others to try this out as well.

I haven't set up a jupyter kernel for this yet.. but that's also on my list! But if it's helpful to get that going sooner, just let me know. And of course if you see problems, or have questions - please send them along.

herjy commented 2 years ago

I just tried out all the dependencies, and everything seems to work. THe scripts are on notebooks, so I'll try a few more in-depth things later, but things seem to be running fine. Thanks a lot Heather!! 😃

heather999 commented 2 years ago

Great! And now the jupyter kernel, desc-td-env, is set too, so you can try out some notebooks. To enable it, you would redo the DESC kernel setup you likely have already done: source /global/common/software/lsst/common/miniconda/kernels/setup.sh That will re-install all the available kernels (like desc-python, desc-stack, etc) and add this new desc-td-env https://confluence.slac.stanford.edu/display/LSSTDESC/Using+Jupyter+at+NERSC#UsingJupyteratNERSC-setup

The next time you start up a fresh jupyter.nersc.gov, the desc-td-env should be available. As always just let me know if it causes any trouble.

herjy commented 2 years ago

The environment seems to work fine, thanks. I just realised one of the dependencies wasn't there, it's desc_dc2_dm_data that I use to call the buttler, but since I'm going to move to gen3 I think I likely won't need it anymore, so all is well.

Do you have pointers for how to use that environments for CI with github actions?

heather999 commented 2 years ago

Thanks for giving it a try @herjy! If you find you want desc_dc2_dm_data, it's easy to add it and we can decide to remove it later.

To use this environment in GitHub Actions

There is also a way to run the NERSC installed td-env conda environment via NERSC's GitLab instance which might be useful if you want to do larger scale tests using the data available at NERSC. I've just started getting going with this for the purpose of building and installing these environments - but it could also be useful for testing. I should start writing up how to do this.. If this is of interest, I can try to provide better details. The hitch is that the GitLab jobs, will utilize our NERSC compute allocation so we have to be a little careful that we're using that compute time reasonably.

herjy commented 2 years ago

Thanks, that is extremely useful. I have actually never used a docker image so I am rather unfamiliar with these. I've been tinkering with the docker without much success so far. Do I need to put this script in the repo and have it run via github actions? Downloading the docker seems to take time and space.

heather999 commented 2 years ago

Hi @herjy sorry for the delay.. I made an example GitHub action workflow here. Right now it just checks out the github repo it is running in, and then pulls the docker image and then runs a test script against it. The test script would be in your github repository, like this example.

There are some ways to cache a docker image but the official mechanism to do that is for use between jobs in a single workflow. There does seem to be a way to share a cache between workflows, but it's not an "official" GitHub action and I want to test that out a bit more before suggesting to use it. This little test ran in ~2.5 minutes with the docker pull included. Space would be another issue. I'll do a little more research to see if I can find a way to improve that. At the very least, I can see about reducing the size of the docker image which might help a bit.

heather999 commented 2 years ago

This is done, we can open additional issues for future package requests.