dask / knit

Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
http://knit.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
53 stars 10 forks source link

Custom MINICONDA_URL #73

Closed jpoullet2000 closed 7 years ago

jpoullet2000 commented 7 years ago

For sake of security, machines are not connected to the internet. So when the CondaCreator class (see env.py) is called it cannot access the internet and install miniconda. Would be possible to test if some ENV variable (say MINICONDA_URL) exists and if not continue the way it is being done so far?

martindurant commented 7 years ago

I expect that you simply get an error message in the case that you try to create an environment without internet access. You may have a point about a URL for miniconda, but I don't think that typical enterprise/isolated conda server installations necessarily provide a miniconda download endpoint. In any case, you are always able to create an environment by whatever means at your disposal and provide that (env=) instead of a list of packages, and then this code path will be skipped. Am I wrong in this assumption?

martindurant commented 7 years ago

@jpoullet2000 , did you manage to get something working using a local environment, or would you still like this option implemented? It would not take too much effort.

jpoullet2000 commented 7 years ago

Thx for reaching out. I haven't got the time to retest it. For some company I'm working for we have set up conda repositories that sync with anaconda+conda-forge repos + we do have some internal custom repo. Users' ~/.condarc files point to the internal repos because most of the internal servers have no internet access. I'll try to test it by the end of the week and let you know.

jpoullet2000 commented 7 years ago

@martindurant , to be more flexible I had to do some tweaking in the code (changed both env.py and dask_yarn.py). The idea is to be able to pass conda_envs as a variable as well.

conda_envs = '/path/to/my/conda_envs'
from knit.dask_yarn import DaskYARNCluster
C = DaskYARNCluster(conda_root=conda_root, channels=channels, conda_envs=conda_envs, packages=packages)
from dask.distributed import Client
client = Client(C)
C.start(2, cpus=1, memory=500)

Changes are quite simple... In dask_yarn.py: l.49

  def __init__(self, autodetect=True, packages=None, ip=None, env=None,
channels=None, conda_root=None, conda_envs=None, **kwargs):

l.59:

self.conda_envs = conda_envs

l.96

c = CondaCreator(channels=self.channels or [],
                            conda_root=self.conda_root, conda_envs=self.conda_envs)

In env.py: l.30

    def __init__(self, conda_root=None, conda_envs=None, channels=[]):
        self.conda_dir = os.path.join(os.path.dirname(__file__), 'tmp_conda')

        self.minifile_fp = os.path.join(self.conda_dir, mini_file)
        self.conda_root = conda_root or os.path.join(self.conda_dir, 'miniconda')
        self.python_bin = os.path.join(self.conda_root, 'bin', 'python')
        self.conda_envs = conda_envs or os.path.join(self.conda_root, 'envs')
        self.conda_bin = os.path.join(self.conda_root, 'bin', 'conda')
        self.channels = channels

l.121 and l.178

        env_path = os.path.join(self.conda_envs, env_name)

Maybe some similar thing can be done with conda_pkgs... Note that without this it is using '/envs' and do not take into account env var like 'CONDA_ENVS_DIRS' or 'CONDA_PKGS_DIRS'. I could work on some PR but I have some bug with py4J (probably an issue in knit-1.0-SNAPSHOT.jar I'm using??). Did not dig deep into it yet.

Py4JError:

An error occurred while calling t.start. Trace: py4j.Py4JException: Method start([class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String]) does not exist

Let me know if you can implement this little modifs. Thx

martindurant commented 7 years ago

Looks very reasonable, I'll put these in.

martindurant commented 7 years ago

How does #79 look to you? It can use the system conda, but you still specify where envs and packages go, separately. miniconda is downloaded into a tempdir and not kept around, if it is needed at all.

jpoullet2000 commented 7 years ago

Seems fine to me. Thx. Any reason why you drop earlier versions of python in .travis.yml ?

JB

Le 8 sept. 2017 11:34 PM, "Martin Durant" notifications@github.com a écrit :

How does #79 https://github.com/dask/knit/pull/79 look to you? It can use the system conda, but you still specify where envs and packages go, separately. miniconda is downloaded into a tempdir and not kept around, if it is needed at all.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/knit/issues/73#issuecomment-328219374, or mute the thread https://github.com/notifications/unsubscribe-auth/AApyHt4k5ksvZvvu_l-GRZsEW-hAVKAwks5sgbLjgaJpZM4O2VAn .

martindurant commented 7 years ago

I was trying to accelerate the test cycle - didn't appear to make any difference. Tests pass locally, but on Travis YARN seems to give out at some point...