Closed sybenzvi closed 4 years ago
Is there a specific version of tensorflow you want to include? It's enough trouble to migrate tensorflow apps from one version to the next that people often don't bother, resulting in lots of existing code and useful examples spanning many incompatible versions and no obvious default choice. Since pytorch is growing in popularity, especially in research settings, you probably also want that in any ML conda env.
At the moment I'm using tensorflow 1.14 (installed from anaconda) but I'd be fine migrating to v2.0, assuming this will maximize the time required before our next update. I haven't tried to run AstroDASH (a third party transient classifier by Daniel Muthukrishna) with tensorflow 2.0, but Daniel is actively working on the toolkit and has responded quickly to recent issues and feature requests.
You make a good point about supporting PyTorch. It's gotten popular enough that we'll probably want to include it as well.
I just want to point out that you can load the desi software stack and create new conda environments with custom sets of packages. I'll check the docs for that today, but it is just a couple of tweaks to the usual conda create
command.
If the custom conda install is working for @sybenzvi, we should close this.
If there is a solution that allows us to use tensorflow + keras in DESI software, it would be great to hear about it, since this is the bottleneck to adding QuasarNet into the main pipeline.
Have you tried following this documentation:
https://desi.lbl.gov/trac/wiki/Pipeline/GettingStarted/NERSC#CustomizingYourDESIPythonEnvironment
?
Sorry, I got confused. I thought the solution allowed to add keras and tensorflow in the main DESI package, so that everyone can call them. In any case, thanks for the reference, and sorry for adding noise.
Well, yes, that is the question here. Is there an operational requirement that these packages be included in the main desiconda package, or is a custom environment sufficient?
And as @dkirkby said, if there is an operational requirement, we need to carefully specify the versions that are required.
desiconda started out as scripts to install the dependencies of the spectro pipeline (i.e. raw data to redshifts). Perhaps we should be more explicit about what downstream software we are trying to enable, and then we can work backwards and see what we need to install.
Note that for some "challenging" packages (like GalSim), it may not even be possible to install if it requires arbitrary compiled dependencies that don't build at NERSC. In that case, another option is docker images like are currently used by the imaging survey.
TensorFlow and Keras are not packages that we can reasonably expect to maintain for 5+ years in the future, and thus we should not include them as a core operational dependency in the base desiconda installations. It is ok to use TensorFlow et al to train a model, but packages like QuasarNet need to use something like tfdeploy to apply the models. For science analyses and training models, the preferred method is to make a separate conda environment, add tensorflow et al to that, and run.
I might be sympathetic to someone building a standard desiconda+tensorflow+keras environment that multiple people could use, but I'm suspicious about including them in the base environment under the promise that people won't use them for operational work.
Documenting here what I've said elsewhere: Google gave a talk about TensorFlow at NERSC last year and we asked them about maintenance plans/recommendations for running a current model years into the future. They basically said that they have no support for that; it's just not how they envision using TensorFlow. Even with something like Docker containers, it would be risky to develop something that would allow us to pin a version of Tensor flow years in the past while still upgrading numpy / scipy / astropy / etc. as needed.
The main use-case that we need to watch out for is a year 5 analysis that needs to study with mocks the operational decisions (LyA QSO reobservations, fiberassign) that were made in year 1.
What @sbailey mentioned is what I thought was the current situation, but I got confused by the comments above in this GitHub issue. Apologies for the noise.
The current plan is to not use QuasarNet in the decision making, to make sure we can reproduce the results.
OK, thank you everyone.
I'm chiming in after this was set to closed/wontfix just to mention that the custom conda solution ought to be fine. That is certainly the path of least resistance so no complaints from me. Thanks everybody.
Several groups are building pipeline afterburners on top of tensorflow and keras (Ly-a and Time Domain, for example). It would be useful to have these packages available in the next release of desiconda.