ContinuumIO / elm

Phase I & part of Phase II of NASA SBIR - Parallel Machine Learning on Satellite Data
http://ensemble-learning-models.readthedocs.io
43 stars 27 forks source link

Add (optional) dependency on earth-env conda package #147

Closed gbrener closed 7 years ago

gbrener commented 7 years ago

Once earth-env is consolidated (via https://github.com/ContinuumIO/earth-env/issues/6) and packaged, add it as an optional dependency for elm.

PeterDSteinberg commented 7 years ago

@gbrener @jbednar Where do we stand on earth-env? My thought was that it was an experimental idea that got us to the earthio package. There is nothing in earth-env's environment files that is not in the conda.recipe for earthio, so my sense is that it is extra tech debt for us at this point to keep earth-env around and perhaps cognitive burden to the new user. Can I close this issue?

PeterDSteinberg commented 7 years ago

Well looks like I just closed it - fingers moving too fast. Reopen if needed.

gbrener commented 7 years ago

Ah, yeah I agree with your previous comment. I think it's safe to keep it closed - I had a flawed understanding of the long-term plan when I opened it.

jbednar commented 7 years ago

If anything, earth-env could become EarthIO/examples/environment.yml. It's actually a copy of datashader/examples/environment.yml, and for now at least that file can form the suggested environment for our earth-related work, since it's required for the datashader examples to run.

PeterDSteinberg commented 7 years ago

I agree it's nice to have an environment.yml file equivalent of a conda.recipe, but my experience is that over time, people/I forget to edit requirements in both places, resulting in heterogeneous developer and user envs. If we have it also in datashader/examples then that's three places we would probably like to stay consistent. Is three places too much? One idea would be to keep it in earthio as an environment.yml, then on earthio CI builds have it anaconda upload the environment and tell datashader users to install the env from anaconda.org. On the other hand it could be simpler to just keep it triplicated for the time being, making a note in the developer release docs for each package that the env specs in those places need to stay consistent. Once earthio stabilizes at some point ca. 6 to 12 months from now, I think we could just tell people to install earthio if they want to do datashader/examples, but I don't want to indirectly break datashader/examples while we go through churn on earthio.

jbednar commented 7 years ago

I don't think the env specs in those places need to stay consistent, they just need to work. A good environment specification must be in datashader/examples/environment.yml, because that's how we tell the users to run the examples; I'm not comfortable depending on some external library for that (particularly not EarthIO, because datashader is in no way earth-specific).

Also, datashader's environment.yml has no relation to datashader's conda recipe, because the examples need tons of external libraries that are not required for datashader itself, and conversely the environment.yml does not (need not) include anything in datashader's conda recipe (since those will already get installed by installing datashader itself). I'd argue the same for EarthIO -- its conda recipe should only include the libraries actually used by EarthIO itself, not any other convenience or extra libraries used in notebooks, etc. And then if datashader's environment.yml already includes what's needed for EarthIO's examples, then great, but if not, then EarthIO can have a separate environment.yml (or you can sneak an example into Datashader to force datashader's environment.yml to be sufficient. :-)