dcos / dcos-e2e

Spin up and manage DC/OS clusters in test environments
Apache License 2.0
60 stars 21 forks source link

Having issues installing on RHEL 7 #1252

Closed openshiftninja closed 6 years ago

openshiftninja commented 6 years ago

I'll preface this with the fact that I'm aware that RHEL isn't explicitly noted as being supported, but I figured that I should be able to make it work since it's still Linux and Python.

$ uname -a
Linux xxxxxxxx 3.10.0-862.6.3.el7.x86_64 #1 SMP Fri Jun 15 17:57:37 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux

I installed Linuxbrew as instructed on https://linuxbrew.sh:

$ sh -c "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install.sh)"
...
==> Installation successful!

==> Homebrew has enabled anonymous aggregate user behaviour analytics.
Read the analytics documentation (and how to opt-out) here:
  https://docs.brew.sh/Analytics.html

Installed the group install of "Development Tools". Did brew install gcc (Linuxbrew suggested these in the next steps). I ran brew doctor and fixed a few things (extra -config scripts detected, git version too old, etc).

Then kicked off the install of dcos-e2e:

brew install https://raw.githubusercontent.com/dcos/dcos-e2e/master/dcose2e.rb
######################################################################## 100.0%
==> Installing dependencies for dcose2e: gdbm, openssl, ncurses, readline, sqlite, xz, bzip2, libffi, python3, pkg-config
==> Installing dcose2e dependency: gdbm
==> Downloading https://linuxbrew.bintray.com/bottles/gdbm-1.16.x86_64_linux.bottle.tar.gz
...

Ultimately, however, things are breaking with segmentation faults, namely python3:

==> Installing dcose2e
==> Downloading https://github.com/dcos/dcos-e2e/archive/2018.07.27.0.tar.gz
==> Downloading from https://codeload.github.com/dcos/dcos-e2e/tar.gz/2018.07.27.0
######################################################################## 100.0%
Warning: Cannot verify integrity of dcose2e-2018.07.27.0.tar.gz
A checksum was not provided for this resource
For your reference the SHA256 is: 888e657ec885c253cff999e3c6b21e7bc3a15cd0e24deac07621960f8bf50a28
==> Downloading https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.
Already downloaded: /home/xxxxxxxx/.cache/Homebrew/dcose2e--homebrew-virtualenv-16.0.0.tar.gz
sh: line 1: 60665 Segmentation fault      (core dumped) python3 --version 2>&1
==> python3 -c import setuptools... --no-user-cfg install --prefix=/tmp/dcose2e--homebrew-virtualenv-20180727-60564-1vltfyl/target --single-v
Last 15 lines from /home/xxxxxxxx/.cache/Homebrew/Logs/dcose2e/01.python3:
2018-07-27 12:35:59 -0400

Ok, so I am bailing on linuxbrew for now. So instead, I tried setting up a virtualenv for python:

$ virtualenv dcos
Using base prefix '/opt/rh/rh-python36/root/usr'
New python executable in /home//dcos/bin/python3
Also creating executable in /home/xxxxxxxxl/dcos/bin/python
Installing setuptools, pip, wheel...done.
$ source dcos/bin/activate

Now I clone the git repo:

git clone https://github.com/dcos/dcos-e2e.git
Cloning into 'dcos-e2e'...
remote: Counting objects: 27681, done.
remote: Compressing objects: 100% (103/103), done.
remote: Total 27681 (delta 58), reused 112 (delta 46), pack-reused 27531
Receiving objects: 100% (27681/27681), 17.20 MiB | 4.74 MiB/s, done.
Resolving deltas: 100% (16306/16306), done.

And then do a python setup install inside of the dcos-e2e folder:

$ python setup.py install                                                                                                                   
/home/xxxxxxxx/dcos/lib/python3.6/site-packages/setuptools/dist.py:388: UserWarning: Normalizing '2018.07.27.0+13.g410d4421' to '2018.7.27.0+13.g410d4421'
  normalized_version,
running install
running bdist_egg
running egg_info
creating src/DCOS_E2E.egg-info
writing src/DCOS_E2E.egg-info/PKG-INFO
writing dependency_links to src/DCOS_E2E.egg-info/dependency_links.txt
writing entry points to src/DCOS_E2E.egg-info/entry_points.txt
writing requirements to src/DCOS_E2E.egg-info/requires.txt
...
Using /home/xxxxxxxx/dcos/lib/python3.6/site-packages/PyJWT-1.6.4-py3.6.egg
Searching for oauthlib==2.1.0
Best match: oauthlib 2.1.0
Processing oauthlib-2.1.0-py3.6.egg
oauthlib 2.1.0 is already the active version in easy-install.pth

Using /home/xxxxxxxx/dcos/lib/python3.6/site-packages/oauthlib-2.1.0-py3.6.egg
Finished processing dependencies for DCOS-E2E==2018.7.27.0+13.g410d4421

Looks great, but then when I run dcos-docker doctor, it fails when it tries to build the image:

$ dcos-docker doctor                                                                                                                        
Traceback (most recent call last):
  File "/home/xxxxxxxx/dcos/bin/dcos-docker", line 11, in <module>
    load_entry_point('DCOS-E2E==2018.7.27.0+13.g410d4421', 'console_scripts', 'dcos-docker')()
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/pkg_resources/__init__.py", line 479, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2703, in load_entry_point
    return ep.load()
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2321, in load
    return self.resolve()
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2327, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/DCOS_E2E-2018.7.27.0+13.g410d4421-py3.6.egg/cli/__init__.py", line 5, in <module>
    from .dcos_aws import dcos_aws
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/DCOS_E2E-2018.7.27.0+13.g410d4421-py3.6.egg/cli/dcos_aws/__init__.py", line 7, in <module>
    from .commands.create import create
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/DCOS_E2E-2018.7.27.0+13.g410d4421-py3.6.egg/cli/dcos_aws/commands/create.py", line 36, in <module>
    from dcos_e2e.backends import AWS
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/DCOS_E2E-2018.7.27.0+13.g410d4421-py3.6.egg/dcos_e2e/backends/__init__.py", line 5, in <module>
    from ._aws import AWS
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/DCOS_E2E-2018.7.27.0+13.g410d4421-py3.6.egg/dcos_e2e/backends/_aws/__init__.py", line 15, in <module>
    from dcos_e2e._vendor.dcos_launch import config, get_launcher
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/DCOS_E2E-2018.7.27.0+13.g410d4421-py3.6.egg/dcos_e2e/_vendor/dcos_launch/__init__.py", line 1, in <module>
    from ..dcos_launch import acs_engine, arm, aws, gcp, terraform, util
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/DCOS_E2E-2018.7.27.0+13.g410d4421-py3.6.egg/dcos_e2e/_vendor/dcos_launch/acs_engine.py", line 16, in <module>
    from ..dcos_launch.platforms import arm as ___vendorize__1
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/DCOS_E2E-2018.7.27.0+13.g410d4421-py3.6.egg/dcos_e2e/_vendor/dcos_launch/platforms/arm.py", line 15, in <module>
    from azure.common.credentials import ServicePrincipalCredentials
ModuleNotFoundError: No module named 'azure.common'

I did a pip install of azure.common in the virtualenv, but it still is giving me the same error.

timaa2k commented 6 years ago

You've downloaded dcos-e2e from the master branch which may be/is currently broken. Please download and install the lastest stable release. Doing this in a virtual env is a good idea. I usually do this via pip and don't bother with linuxbrew: pip install git+https://github.com/dcos/dcos-e2e.git@2018.07.27.0

openshiftninja commented 6 years ago
$ pip install git+https://github.com/dcos/dcos-e2e.git@2018.07.27.0
Collecting git+https://github.com/dcos/dcos-e2e.git@2018.07.27.0
  Cloning https://github.com/dcos/dcos-e2e.git (to revision 2018.07.27.0) to /tmp/pip-req-build-4q2yvh6d
Requirement not upgraded as not directly required: keyring in ./dcos/lib/python3.6/site-packages/keyring-13.2.1-py3.6.egg (from DCOS-E2E==2018.7.27.0) (13.2.1)
...
Building wheels for collected packages: DCOS-E2E
  Running setup.py bdist_wheel for DCOS-E2E ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-jq2xpzhe/wheels/93/51/44/0eb545808e412cfcca0eb9ba81ea60300a319aa40fb83bf7b8
Successfully built DCOS-E2E
Installing collected packages: azure-common, DCOS-E2E
  Found existing installation: azure-common 1.1.14
    Uninstalling azure-common-1.1.14:
      Successfully uninstalled azure-common-1.1.14
  Found existing installation: DCOS-E2E 2018.7.27.0
    Uninstalling DCOS-E2E-2018.7.27.0:
      Successfully uninstalled DCOS-E2E-2018.7.27.0
Successfully installed DCOS-E2E-2018.7.27.0 azure-common-1.1.13

Done. dcos-docker doctor then blows up looking for some azure stuff, so I had to get that all installed (kind of annoying because it blows up without a version, then when I do a pip install, it actually tells me the right version, so I have do an install again). Now it complains when it tries to build the image:

$ dcos-docker doctor

Note: Docker has approximately 15.5 GB of memory available. The amount of memory required depends on the workload. For example, creating large clusters or multiple clusters requires a lot of memory.
A four node cluster seems to work well on a machine with 9 GB of memory available to Docker.
Traceback (most recent call last):
  File "/home/xxxxxxxx/dcos/bin/dcos-docker", line 11, in <module>
    sys.exit(dcos_docker())
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/cli/dcos_docker/commands/doctor.py", line 423, in doctor
    level = function()
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/cli/dcos_docker/commands/doctor.py", line 364, in _check_can_mount_in_docker
    with Cluster(cluster_backend=cluster_backend) as cluster:
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/dcos_e2e/cluster.py", line 62, in __init__
    cluster_backend=cluster_backend,
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/dcos_e2e/backends/_docker/__init__.py", line 326, in __init__
    docker_version=cluster_backend.docker_version,
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/dcos_e2e/backends/_docker/_docker_build.py", line 68, in build_docker_image
    tag=base_tag,
  File "/home/xxxxxxxx/dcos/lib/python3.6/site-packages/docker-3.4.1-py3.6.egg/docker/models/images.py", line 266, in build
    raise BuildError(chunk['error'], result_stream)
docker.errors.BuildError: The command '/bin/sh -c yum install -y                bash-completion                 bind-utils              btrfs-progs             ca-certificates           curl            git             iproute                 ipset           iptables                iputils                 libcgroup               libselinux-utils          net-tools               openssh-client          openssh-server          sudo            systemd                 tar             tree            unzip    which                 xfsprogs           xz && ( cd /lib/systemd/system/sysinit.target.wants/; for i in *; do if [ "$i" != "systemd-tmpfiles-setup.service" ]; then rm -f $i; fi done ) && rm -f /lib/systemd/system/multi-user.target.wants/* && rm -f /etc/systemd/system/*.wants/* && rm -f /lib/systemd/system/local-fs.target.wants/* && rm -f /lib/systemd/system/sockets.target.wants/*udev* && rm -f /lib/systemd/system/sockets.target.wants/*initctl* && rm -f /lib/systemd/system/anaconda.target.wants/* && rm -f /lib/systemd/system/basic.target.wants/* && rm -f /lib/systemd/system/graphical.target.wants/* && ln -vf /lib/systemd/system/multi-user.target /lib/systemd/system/default.target' returned a non-zero code: 1

pip freeze output:

adal==1.0.2
asn1crypto==0.24.0
atomicwrites==1.1.5
attrs==18.1.0
azure-common==1.1.13
azure-mgmt-advisor==1.0.1
azure-mgmt-applicationinsights==0.1.1
azure-mgmt-authorization==0.30.0
azure-mgmt-batch==5.0.1
azure-mgmt-batchai==0.2.0
azure-mgmt-billing==0.1.0
azure-mgmt-cdn==2.0.0
azure-mgmt-cognitiveservices==2.0.0
azure-mgmt-commerce==1.0.1
azure-mgmt-compute==3.0.1
azure-mgmt-consumption==2.0.0
azure-mgmt-containerinstance==0.3.1
azure-mgmt-containerregistry==1.0.1
azure-mgmt-containerservice==3.0.1
azure-mgmt-cosmosdb==0.3.1
azure-mgmt-datafactory==0.4.0
azure-mgmt-datalake-analytics==0.3.0
azure-mgmt-datalake-nspkg==2.0.0
azure-mgmt-datalake-store==0.3.0
azure-mgmt-devtestlabs==2.2.0
azure-mgmt-dns==1.2.0
azure-mgmt-eventgrid==0.4.0
azure-mgmt-eventhub==1.2.0
azure-mgmt-hanaonazure==0.1.1
azure-mgmt-iothub==0.4.0
azure-mgmt-iothubprovisioningservices==0.1.0
azure-mgmt-keyvault==0.40.0
azure-mgmt-loganalytics==0.1.0
azure-mgmt-logic==2.1.0
azure-mgmt-machinelearningcompute==0.4.1
azure-mgmt-managementpartner==0.1.0
azure-mgmt-marketplaceordering==0.1.0
azure-mgmt-media==0.2.0
azure-mgmt-monitor==0.4.0
azure-mgmt-msi==0.1.0
azure-mgmt-network==2.0.0rc2
azure-mgmt-notificationhubs==1.0.0
azure-mgmt-nspkg==2.0.0
azure-mgmt-powerbiembedded==1.0.0
azure-mgmt-rdbms==0.1.0
azure-mgmt-recoveryservices==0.2.0
azure-mgmt-recoveryservicesbackup==0.1.1
azure-mgmt-redis==5.0.0
azure-mgmt-relay==0.1.0
azure-mgmt-reservations==0.1.0
azure-mgmt-resource==2.0.0
azure-mgmt-scheduler==1.1.3
azure-mgmt-search==1.0.0
azure-mgmt-servermanager==1.2.0
azure-mgmt-servicebus==0.4.0
azure-mgmt-servicefabric==0.1.0
azure-mgmt-sql==0.8.6
azure-mgmt-storage==1.5.0
azure-mgmt-subscription==0.1.0
azure-mgmt-trafficmanager==0.40.0
azure-mgmt-web==0.34.1
azure-monitor==0.3.1
azure-nspkg==2.0.0
bcrypt==3.1.4
boto3==1.7.58
botocore==1.10.58
cachetools==2.1.0
Cerberus==1.2
certifi==2018.4.16
cffi==1.11.5
chardet==3.0.4
click==6.7
click-spinner==0.1.8
cryptography==2.2.2
DCOS-E2E==2018.7.27.0
decorator==4.3.0
docker==3.4.1
docker-pycreds==0.3.0
docopt==0.6.2
docutils==0.14
entrypoints==0.2.3
google-api-python-client==1.7.4
google-auth==1.5.0
google-auth-httplib2==0.0.3
httplib2==0.11.3
idna==2.7
isodate==0.6.0
jeepney==0.3.1
jmespath==0.9.3
keyring==13.2.1
more-itertools==4.2.0
msrest==0.5.4
msrestazure==0.4.34
oauth2client==4.1.2
oauthlib==2.1.0
paramiko==2.4.1
passlib==1.7.1
pluggy==0.6.0
py==1.5.4
pyasn1==0.4.4
pyasn1-modules==0.2.2
pycparser==2.18
PyJWT==1.6.4
PyNaCl==1.2.1
pytest==3.6.3
python-dateutil==2.7.3
python-vagrant==0.5.15
PyYAML==4.2b4
requests==2.19.1
requests-oauthlib==1.0.0
retry==0.9.2
retrying==1.3.3
rsa==3.4.2
s3transfer==0.1.13
SecretStorage==3.0.1
six==1.11.0
uritemplate==3.0.0
urllib3==1.23
websocket-client==0.48.0
timaa2k commented 6 years ago

Hey @openshiftninja, I think I was jumping ahead to quickly. Could you please follow each step in the installation guide here? https://github.com/dcos/dcos-e2e#library-and-cli-with-python Some of the prerequisites like python3-devel might be missing but they can be installed for RHEL7 as well.

I just had a try on a RHEL 7.3 AWS instance and successfully launched a cluster following these steps (no other prerequisites required):

$ uname -a
Linux ip-10-10-0-79.eu-west-1.compute.internal 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

# Install the latest Docker CE version
$ yum install -y yum-utils
$ yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
$ yum makecache fast
$ yum install -y http://mirror.centos.org/centos/7/extras/x86_64/Packages/container-selinux-2.66-1.el7.noarch.rpm
$ yum install -y docker-ce
$ systemctl start docker

# Install Python 3.6 and pip 3.6
$ yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-$(rpm -E '%{rhel}').noarch.rpm
$ yum install -y https://centos7.iuscommunity.org/ius-release.rpm
$ yum install -y python36u python36u-pip

# Link Python3 and pip3
$ ln -s /usr/bin/python3.6 /usr/bin/python3
$ ln -s /usr/bin/pip3.6 /usr/bin/pip3

# Install git
$ yum -y install git

# Create and activate a virtual environment
mkdir -p ~/.virtualenv
python3 -m venv ~/.virtualenv/dcos-env
. ~/.virtualenv/dcos-env/bin/activate

# Get latest stable version of dcos-e2e (2018.07.27.0 at the time of writing)
$ pip3 install --upgrade git+https://github.com/dcos/dcos-e2e.git@2018.07.27.0

# Run doctor
$ dcos-docker doctor

# Download artifact
$ dcos-docker download-artifact

# Launch smallest cluster with the latest Docker version used within DC/OS
$ dcos-docker create --docker-version 17.12.1-ce --agents 0 --public-agents 0 /tmp/dcos_generate_config.sh

# Wait for DC/OS to come up
$ dcos-docker wait
adamtheturtle commented 6 years ago

Thank you @openshiftninja for the detailed report.

Thank you @timaa2k for detailing exactly what to do!

python setup.py install is not what we recommend - instead pip install . will get you all dependencies. I do not think that this is an issue with broken master.

However, we should still get the Linuxbrew issues fixed.

Your original tweet mentioned a setuptools issue.

Could you please:

openshiftninja commented 6 years ago

I got blocked on other stuff today, but I will be trying out the stuff that you guys suggested Tuesday.

openshiftninja commented 6 years ago

I did a reinstall of brew, and now trying to install dcos-e2e with it results in python3 just seg faulting:

▶ brew install https://raw.githubusercontent.com/dcos/dcos-e2e/master/dcose2e.rb                                                                
######################################################################## 100.0%
==> Downloading https://github.com/dcos/dcos-e2e/archive/2018.07.30.0.tar.gz
==> Downloading from https://codeload.github.com/dcos/dcos-e2e/tar.gz/2018.07.30.0
######################################################################## 100.0%
Warning: Cannot verify integrity of dcose2e-2018.07.30.0.tar.gz
A checksum was not provided for this resource
For your reference the SHA256 is: 99256af25b8208b95bac173d4c82b5239a3064ed035b2348954a7a4047d5d810
==> Downloading https://files.pythonhosted.org/packages/33/bc/fa0b5347139cd9564f0d44ebd2b147ac97c36b2403943dbee8a25fd74012/virtualenv-16.0.0.tar
######################################################################## 100.0%
sh: line 1: 10253 Segmentation fault      (core dumped) python3 --version 2>&1
==> python3 -c import setuptools... --no-user-cfg install --prefix=/tmp/dcose2e--homebrew-virtualenv-20180731-10108-747bna/target --single-versi
Last 15 lines from /home/xxxxxxx/.cache/Homebrew/Logs/dcose2e/01.python3:
2018-07-31 12:14:49 -0400

python3
-c
import setuptools, tokenize
__file__ = 'setup.py'
exec(compile(getattr(tokenize, 'open', open)(__file__).read()
  .replace('\r\n', '\n'), __file__, 'exec'))
--no-user-cfg
install
--prefix=/tmp/dcose2e--homebrew-virtualenv-20180731-10108-747bna/target
--single-version-externally-managed
--record=installed.txt

Do not report this issue to Homebrew/brew or Homebrew/core!

I'm not really getting anywhere with brew, so I'm bailing on that. Tried with the pip3 install + the git url and still getting the same error:

▶ pip3 install --proxy $HTTP_PROXY --trusted-host pypi.org --trusted-host pypi.python.org --trusted-host files.pythonhosted.org --upgrade git+https://github.com/dcos/dcos
-e2e.git@2018.07.27.0 
Collecting git+https://github.com/dcos/dcos-e2e.git@2018.07.27.0
  Cloning https://github.com/dcos/dcos-e2e.git (to revision 2018.07.27.0) to /tmp/pip-req-build-uzhgp3n4
Requirement not upgraded as not directly required: keyring in ./dcos/lib/python3.6/site-packages/keyring-13.2.1-py3.6.egg (from DCOS-E2E==2018.7.27.0) (13.2.1)
Requirement not upgraded as not directly required: secretstorage in ./dcos/lib/python3.6/site-packages/SecretStorage-3.0.1-py3.6.egg (from DCOS-E2E==2018.7.27.0) (3.0.1)
Requirement not upgraded as not directly required: Cerberus==1.2 in ./dcos/lib/python3.6/site-packages/Cerberus-1.2-py3.6.egg (from DCOS-E2E==2018.7.27.0) (1.2)
Requirement not upgraded as not directly required: PyYAML==4.2b4 in ./dcos/lib/python3.6/site-packages/PyYAML-4.2b4-py3.6-linux-x86_64.egg (from DCOS-E2E==2018.7.27.0) (4
.2b4)
Requirement not upgraded as not directly required: azure-common==1.1.13 in ./dcos/lib/python3.6/site-packages (from DCOS-E2E==2018.7.27.0) (1.1.13)
Requirement not upgraded as not directly required: azure-mgmt-network==2.0.0rc2 in ./dcos/lib/python3.6/site-packages (from DCOS-E2E==2018.7.27.0) (2.0.0rc2)
Collecting azure-mgmt-resource==1.2.2 (from DCOS-E2E==2018.7.27.0)
  Downloading https://files.pythonhosted.org/packages/71/ec/30b1bea83782bd890ba84c21ab8d1af71bc30f14f51b3688c0a32aec82ce/azure_mgmt_resource-1.2.2-py2.py3-none-any.whl (3
23kB)
    100% |████████████████████████████████| 327kB 485kB/s 
Requirement not upgraded as not directly required: azure-monitor==0.3.1 in ./dcos/lib/python3.6/site-packages (from DCOS-E2E==2018.7.27.0) (0.3.1)
Requirement not upgraded as not directly required: boto3==1.7.58 in ./dcos/lib/python3.6/site-packages/boto3-1.7.58-py3.6.egg (from DCOS-E2E==2018.7.27.0) (1.7.58)
Requirement not upgraded as not directly required: botocore==1.10.58 in ./dcos/lib/python3.6/site-packages/botocore-1.10.58-py3.6.egg (from DCOS-E2E==2018.7.27.0) (1.10.5
8)
Requirement not upgraded as not directly required: click-spinner==0.1.8 in ./dcos/lib/python3.6/site-packages/click_spinner-0.1.8-py3.6.egg (from DCOS-E2E==2018.7.27.0) (
0.1.8)
Requirement not upgraded as not directly required: click==6.7 in ./dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg (from DCOS-E2E==2018.7.27.0) (6.7)
Requirement not upgraded as not directly required: cryptography==2.2.2 in ./dcos/lib/python3.6/site-packages/cryptography-2.2.2-py3.6-linux-x86_64.egg (from DCOS-E2E==201
8.7.27.0) (2.2.2)
Requirement not upgraded as not directly required: docker==3.4.1 in ./dcos/lib/python3.6/site-packages/docker-3.4.1-py3.6.egg (from DCOS-E2E==2018.7.27.0) (3.4.1)
Requirement not upgraded as not directly required: docopt==0.6.2 in ./dcos/lib/python3.6/site-packages/docopt-0.6.2-py3.6.egg (from DCOS-E2E==2018.7.27.0) (0.6.2)
Requirement not upgraded as not directly required: docutils==0.14 in ./dcos/lib/python3.6/site-packages/docutils-0.14-py3.6.egg (from DCOS-E2E==2018.7.27.0) (0.14)
Requirement not upgraded as not directly required: google-api-python-client==1.7.4 in ./dcos/lib/python3.6/site-packages/google_api_python_client-1.7.4-py3.6.egg (from DC
OS-E2E==2018.7.27.0) (1.7.4)
Requirement not upgraded as not directly required: oauth2client==4.1.2 in ./dcos/lib/python3.6/site-packages/oauth2client-4.1.2-py3.6.egg (from DCOS-E2E==2018.7.27.0) (4.
1.2)
Requirement not upgraded as not directly required: paramiko==2.4.1 in ./dcos/lib/python3.6/site-packages/paramiko-2.4.1-py3.6.egg (from DCOS-E2E==2018.7.27.0) (2.4.1)
Requirement not upgraded as not directly required: passlib==1.7.1 in ./dcos/lib/python3.6/site-packages/passlib-1.7.1-py3.6.egg (from DCOS-E2E==2018.7.27.0) (1.7.1)
Requirement not upgraded as not directly required: pytest==3.6.3 in ./dcos/lib/python3.6/site-packages/pytest-3.6.3-py3.6.egg (from DCOS-E2E==2018.7.27.0) (3.6.3)
Requirement not upgraded as not directly required: python-vagrant==0.5.15 in ./dcos/lib/python3.6/site-packages/python_vagrant-0.5.15-py3.6.egg (from DCOS-E2E==2018.7.27.0) (0.5.15)
Requirement not upgraded as not directly required: requests==2.19.1 in ./dcos/lib/python3.6/site-packages/requests-2.19.1-py3.6.egg (from DCOS-E2E==2018.7.27.0) (2.19.1)
Requirement not upgraded as not directly required: retry==0.9.2 in ./dcos/lib/python3.6/site-packages/retry-0.9.2-py3.6.egg (from DCOS-E2E==2018.7.27.0) (0.9.2)
Requirement not upgraded as not directly required: retrying==1.3.3 in ./dcos/lib/python3.6/site-packages/retrying-1.3.3-py3.6.egg (from DCOS-E2E==2018.7.27.0) (1.3.3)
Requirement not upgraded as not directly required: setuptools>=40.0.0 in ./dcos/lib/python3.6/site-packages (from DCOS-E2E==2018.7.27.0) (40.0.0)
Requirement not upgraded as not directly required: urllib3==1.23 in ./dcos/lib/python3.6/site-packages/urllib3-1.23-py3.6.egg (from DCOS-E2E==2018.7.27.0) (1.23)
Requirement not upgraded as not directly required: entrypoints in ./dcos/lib/python3.6/site-packages/entrypoints-0.2.3-py3.6.egg (from keyring->DCOS-E2E==2018.7.27.0) (0.
2.3)
Requirement not upgraded as not directly required: jeepney in ./dcos/lib/python3.6/site-packages/jeepney-0.3.1-py3.6.egg (from secretstorage->DCOS-E2E==2018.7.27.0) (0.3.
1)                                                                                                                                                               [61/1561]
Requirement not upgraded as not directly required: azure-nspkg>=2.0.0 in ./dcos/lib/python3.6/site-packages (from azure-common==1.1.13->DCOS-E2E==2018.7.27.0) (2.0.0)
Requirement not upgraded as not directly required: azure-mgmt-nspkg>=2.0.0 in ./dcos/lib/python3.6/site-packages/azure_mgmt_nspkg-2.0.0-py3.6.egg (from azure-mgmt-network
==2.0.0rc2->DCOS-E2E==2018.7.27.0) (2.0.0)
Requirement not upgraded as not directly required: msrestazure<2.0.0,>=0.4.20 in ./dcos/lib/python3.6/site-packages/msrestazure-0.4.34-py3.6.egg (from azure-mgmt-network=
=2.0.0rc2->DCOS-E2E==2018.7.27.0) (0.4.34)
Requirement not upgraded as not directly required: jmespath<1.0.0,>=0.7.1 in ./dcos/lib/python3.6/site-packages/jmespath-0.9.3-py3.6.egg (from boto3==1.7.58->DCOS-E2E==20
18.7.27.0) (0.9.3)
Requirement not upgraded as not directly required: s3transfer<0.2.0,>=0.1.10 in ./dcos/lib/python3.6/site-packages/s3transfer-0.1.13-py3.6.egg (from boto3==1.7.58->DCOS-E
2E==2018.7.27.0) (0.1.13)
Requirement not upgraded as not directly required: python-dateutil<3.0.0,>=2.1 in ./dcos/lib/python3.6/site-packages/python_dateutil-2.7.3-py3.6.egg (from botocore==1.10.
58->DCOS-E2E==2018.7.27.0) (2.7.3)
Requirement not upgraded as not directly required: asn1crypto>=0.21.0 in ./dcos/lib/python3.6/site-packages/asn1crypto-0.24.0-py3.6.egg (from cryptography==2.2.2->DCOS-E2
E==2018.7.27.0) (0.24.0)
Requirement not upgraded as not directly required: cffi>=1.7 in ./dcos/lib/python3.6/site-packages/cffi-1.11.5-py3.6-linux-x86_64.egg (from cryptography==2.2.2->DCOS-E2E=
=2018.7.27.0) (1.11.5)
Requirement not upgraded as not directly required: idna>=2.1 in ./dcos/lib/python3.6/site-packages/idna-2.7-py3.6.egg (from cryptography==2.2.2->DCOS-E2E==2018.7.27.0) (2
.7)
Requirement not upgraded as not directly required: six>=1.4.1 in ./dcos/lib/python3.6/site-packages/six-1.11.0-py3.6.egg (from cryptography==2.2.2->DCOS-E2E==2018.7.27.0)
 (1.11.0)
Requirement not upgraded as not directly required: docker-pycreds>=0.3.0 in ./dcos/lib/python3.6/site-packages/docker_pycreds-0.3.0-py3.6.egg (from docker==3.4.1->DCOS-E2
E==2018.7.27.0) (0.3.0)
Requirement not upgraded as not directly required: websocket-client>=0.32.0 in ./dcos/lib/python3.6/site-packages/websocket_client-0.48.0-py3.6.egg (from docker==3.4.1->D
COS-E2E==2018.7.27.0) (0.48.0)
Requirement not upgraded as not directly required: google-auth-httplib2>=0.0.3 in ./dcos/lib/python3.6/site-packages/google_auth_httplib2-0.0.3-py3.6.egg (from google-api
-python-client==1.7.4->DCOS-E2E==2018.7.27.0) (0.0.3)
Requirement not upgraded as not directly required: google-auth>=1.4.1 in ./dcos/lib/python3.6/site-packages/google_auth-1.5.0-py3.6.egg (from google-api-python-client==1.
7.4->DCOS-E2E==2018.7.27.0) (1.5.0)
Requirement not upgraded as not directly required: httplib2<1dev,>=0.9.2 in ./dcos/lib/python3.6/site-packages/httplib2-0.11.3-py3.6.egg (from google-api-python-client==1
.7.4->DCOS-E2E==2018.7.27.0) (0.11.3)
Requirement not upgraded as not directly required: uritemplate<4dev,>=3.0.0 in ./dcos/lib/python3.6/site-packages/uritemplate-3.0.0-py3.6.egg (from google-api-python-clie
nt==1.7.4->DCOS-E2E==2018.7.27.0) (3.0.0)
Requirement not upgraded as not directly required: pyasn1-modules>=0.0.5 in ./dcos/lib/python3.6/site-packages/pyasn1_modules-0.2.2-py3.6.egg (from oauth2client==4.1.2->D
COS-E2E==2018.7.27.0) (0.2.2)
Requirement not upgraded as not directly required: pyasn1>=0.1.7 in ./dcos/lib/python3.6/site-packages/pyasn1-0.4.4-py3.6.egg (from oauth2client==4.1.2->DCOS-E2E==2018.7.
27.0) (0.4.4)
Requirement not upgraded as not directly required: rsa>=3.1.4 in ./dcos/lib/python3.6/site-packages/rsa-3.4.2-py3.6.egg (from oauth2client==4.1.2->DCOS-E2E==2018.7.27.0) 
(3.4.2)
Requirement not upgraded as not directly required: bcrypt>=3.1.3 in ./dcos/lib/python3.6/site-packages/bcrypt-3.1.4-py3.6-linux-x86_64.egg (from paramiko==2.4.1->DCOS-E2E
==2018.7.27.0) (3.1.4)
Requirement not upgraded as not directly required: pynacl>=1.0.1 in ./dcos/lib/python3.6/site-packages/PyNaCl-1.2.1-py3.6-linux-x86_64.egg (from paramiko==2.4.1->DCOS-E2E
==2018.7.27.0) (1.2.1)
Requirement not upgraded as not directly required: atomicwrites>=1.0 in ./dcos/lib/python3.6/site-packages/atomicwrites-1.1.5-py3.6.egg (from pytest==3.6.3->DCOS-E2E==201
8.7.27.0) (1.1.5)
Requirement not upgraded as not directly required: attrs>=17.4.0 in ./dcos/lib/python3.6/site-packages/attrs-18.1.0-py3.6.egg (from pytest==3.6.3->DCOS-E2E==2018.7.27.0) 
(18.1.0)
Requirement not upgraded as not directly required: more-itertools>=4.0.0 in ./dcos/lib/python3.6/site-packages/more_itertools-4.2.0-py3.6.egg (from pytest==3.6.3->DCOS-E2
E==2018.7.27.0) (4.2.0)
Requirement not upgraded as not directly required: pluggy<0.7,>=0.5 in ./dcos/lib/python3.6/site-packages/pluggy-0.6.0-py3.6.egg (from pytest==3.6.3->DCOS-E2E==2018.7.27.
0) (0.6.0)
Requirement not upgraded as not directly required: py>=1.5.0 in ./dcos/lib/python3.6/site-packages/py-1.5.4-py3.6.egg (from pytest==3.6.3->DCOS-E2E==2018.7.27.0) (1.5.4)
Requirement not upgraded as not directly required: certifi>=2017.4.17 in ./dcos/lib/python3.6/site-packages/certifi-2018.4.16-py3.6.egg (from requests==2.19.1->DCOS-E2E==
2018.7.27.0) (2018.4.16)
Requirement not upgraded as not directly required: chardet<3.1.0,>=3.0.2 in ./dcos/lib/python3.6/site-packages/chardet-3.0.4-py3.6.egg (from requests==2.19.1->DCOS-E2E==2
018.7.27.0) (3.0.4)
Requirement not upgraded as not directly required: decorator>=3.4.2 in ./dcos/lib/python3.6/site-packages/decorator-4.3.0-py3.6.egg (from retry==0.9.2->DCOS-E2E==2018.7.2
7.0) (4.3.0)
Requirement not upgraded as not directly required: adal<2.0.0,>=0.5.0 in ./dcos/lib/python3.6/site-packages/adal-1.0.2-py3.6.egg (from msrestazure<2.0.0,>=0.4.20->azure-m
gmt-network==2.0.0rc2->DCOS-E2E==2018.7.27.0) (1.0.2)
Requirement not upgraded as not directly required: msrest<2.0.0,>=0.4.28 in ./dcos/lib/python3.6/site-packages/msrest-0.5.4-py3.6.egg (from msrestazure<2.0.0,>=0.4.20->az
ure-mgmt-network==2.0.0rc2->DCOS-E2E==2018.7.27.0) (0.5.4)
Requirement not upgraded as not directly required: pycparser in ./dcos/lib/python3.6/site-packages/pycparser-2.18-py3.6.egg (from cffi>=1.7->cryptography==2.2.2->DCOS-E2E
==2018.7.27.0) (2.18)
Requirement not upgraded as not directly required: cachetools>=2.0.0 in ./dcos/lib/python3.6/site-packages/cachetools-2.1.0-py3.6.egg (from google-auth>=1.4.1->google-api
-python-client==1.7.4->DCOS-E2E==2018.7.27.0) (2.1.0)
Requirement not upgraded as not directly required: PyJWT>=1.0.0 in ./dcos/lib/python3.6/site-packages/PyJWT-1.6.4-py3.6.egg (from adal<2.0.0,>=0.5.0->msrestazure<2.0.0,>=
0.4.20->azure-mgmt-network==2.0.0rc2->DCOS-E2E==2018.7.27.0) (1.6.4)
Requirement not upgraded as not directly required: isodate>=0.6.0 in ./dcos/lib/python3.6/site-packages/isodate-0.6.0-py3.6.egg (from msrest<2.0.0,>=0.4.28->msrestazure<2
.0.0,>=0.4.20->azure-mgmt-network==2.0.0rc2->DCOS-E2E==2018.7.27.0) (0.6.0)
Requirement not upgraded as not directly required: requests-oauthlib>=0.5.0 in ./dcos/lib/python3.6/site-packages/requests_oauthlib-1.0.0-py3.6.egg (from msrest<2.0.0,>=0
.4.28->msrestazure<2.0.0,>=0.4.20->azure-mgmt-network==2.0.0rc2->DCOS-E2E==2018.7.27.0) (1.0.0)
Requirement not upgraded as not directly required: oauthlib>=0.6.2 in ./dcos/lib/python3.6/site-packages/oauthlib-2.1.0-py3.6.egg (from requests-oauthlib>=0.5.0->msrest<2
.0.0,>=0.4.28->msrestazure<2.0.0,>=0.4.20->azure-mgmt-network==2.0.0rc2->DCOS-E2E==2018.7.27.0) (2.1.0)
Building wheels for collected packages: DCOS-E2E
  Running setup.py bdist_wheel for DCOS-E2E ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-3ss3lo6y/wheels/93/51/44/0eb545808e412cfcca0eb9ba81ea60300a319aa40fb83bf7b8
Successfully built DCOS-E2E
Installing collected packages: azure-mgmt-resource, DCOS-E2E
  Found existing installation: azure-mgmt-resource 2.0.0
    Uninstalling azure-mgmt-resource-2.0.0:
      Successfully uninstalled azure-mgmt-resource-2.0.0
  Found existing installation: DCOS-E2E 2018.7.27.0
    Uninstalling DCOS-E2E-2018.7.27.0:
      Successfully uninstalled DCOS-E2E-2018.7.27.0
Successfully installed DCOS-E2E-2018.7.27.0 azure-mgmt-resource-1.2.2
▶ dcos-docker doctor

Note: Docker has approximately 15.5 GB of memory available. The amount of memory required depends on the workload. For example, creating large clusters or multiple cluste
rs requires a lot of memory.
A four node cluster seems to work well on a machine with 9 GB of memory available to Docker.
Traceback (most recent call last):
  File "/home/xxxxxxx/dcos/bin/dcos-docker", line 11, in <module>
    sys.exit(dcos_docker())
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/cli/dcos_docker/commands/doctor.py", line 423, in doctor
    level = function()
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/cli/dcos_docker/commands/doctor.py", line 364, in _check_can_mount_in_docker
    with Cluster(cluster_backend=cluster_backend) as cluster:
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/dcos_e2e/cluster.py", line 62, in __init__
    cluster_backend=cluster_backend,
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/dcos_e2e/backends/_docker/__init__.py", line 326, in __init__
    docker_version=cluster_backend.docker_version,
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/dcos_e2e/backends/_docker/_docker_build.py", line 68, in build_docker_image
    tag=base_tag,
  File "/home/xxxxxxx/dcos/lib/python3.6/site-packages/docker-3.4.1-py3.6.egg/docker/models/images.py", line 266, in build
    raise BuildError(chunk['error'], result_stream)
docker.errors.BuildError: The command '/bin/sh -c yum install -y                bash-completion                 bind-utils              btrfs-progs            ca-certific
ates                 curl            git             iproute                 ipset           iptables              iputils          libcgroup               libselinux-uti
ls                net-tools               openssh-client          openssh-server        sudo             systemd                 tar             tree            unzip    
       which                 xfsprogs          xz && ( cd /lib/systemd/system/sysinit.target.wants/; for i in *; do if [ "$i" != "systemd-tmpfiles-setup.service" ]; then 
rm -f $i; fi done ) && rm -f /lib/systemd/system/multi-user.target.wants/* && rm -f /etc/systemd/system/*.wants/* && rm -f /lib/systemd/system/local-fs.target.wants/* && 
rm -f /lib/systemd/system/sockets.target.wants/*udev* && rm -f /lib/systemd/system/sockets.target.wants/*initctl* && rm -f /lib/systemd/system/anaconda.target.wants/* && 
rm -f /lib/systemd/system/basic.target.wants/* && rm -f /lib/systemd/system/graphical.target.wants/* && ln -vf /lib/systemd/system/multi-user.target /lib/systemd/system/d
efault.target' returned a non-zero code: 1
openshiftninja commented 6 years ago

Having said that, let me try after installing some of the python and other dependencies you mentioned from the sources you specified in the instructions.

openshiftninja commented 6 years ago

No dice. I installed fresh python3 and pip and created a new virtualenv from the above steps:

# Install Python 3.6 and pip 3.6
$ yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-$(rpm -E '%{rhel}').noarch.rpm
$ yum install -y https://centos7.iuscommunity.org/ius-release.rpm
$ yum install -y python36u python36u-pip

# Link Python3 and pip3
$ ln -s /usr/bin/python3.6 /usr/bin/python3
$ ln -s /usr/bin/pip3.6 /usr/bin/pip3

# Create and activate a virtual environment
mkdir -p ~/.virtualenv
python3 -m venv ~/.virtualenv/dcos-env
. ~/.virtualenv/dcos-env/bin/activate

Successfully installed all the DCOS-E2E packages:

Successfully installed Cerberus-1.2 DCOS-E2E-2018.7.27.0 PyJWT-1.6.4 PyYAML-4.2b4 adal-1.0.2 asn1crypto-0.24.0 atomicwrites-1.1.5 attrs-18.1.0 azure-common-1.1.13 azure-m
gmt-network-2.0.0rc2 azure-mgmt-nspkg-2.0.0 azure-mgmt-resource-1.2.2 azure-monitor-0.3.1 azure-nspkg-2.0.0 bcrypt-3.1.4 boto3-1.7.58 botocore-1.10.58 cachetools-2.1.0 ce
rtifi-2018.4.16 cffi-1.11.5 chardet-3.0.4 click-6.7 click-spinner-0.1.8 cryptography-2.2.2 decorator-4.3.0 docker-3.4.1 docker-pycreds-0.3.0 docopt-0.6.2 docutils-0.14 en
trypoints-0.2.3 google-api-python-client-1.7.4 google-auth-1.5.0 google-auth-httplib2-0.0.3 httplib2-0.11.3 idna-2.7 isodate-0.6.0 jeepney-0.3.1 jmespath-0.9.3 keyring-13
.2.1 more-itertools-4.3.0 msrest-0.5.4 msrestazure-0.4.34 oauth2client-4.1.2 oauthlib-2.1.0 paramiko-2.4.1 passlib-1.7.1 pluggy-0.6.0 py-1.5.4 pyasn1-0.4.4 pyasn1-modules
-0.2.2 pycparser-2.18 pynacl-1.2.1 pytest-3.6.3 python-dateutil-2.7.3 python-vagrant-0.5.15 requests-2.19.1 requests-oauthlib-1.0.0 retry-0.9.2 retrying-1.3.3 rsa-3.4.2 s
3transfer-0.1.13 secretstorage-3.0.1 setuptools-40.0.0 six-1.11.0 uritemplate-3.0.0 urllib3-1.23 websocket-client-0.48.0

Still getting that same error when running dcos-docker doctor:

docker.errors.BuildError: The command '/bin/sh -c yum install -y                bash-completion                 bind-utils              btrfs-progs             ca-certifi
cates           curl            git             iproute                 ipset           iptables                iputils                 libcgroup               libselinux
-utils          net-tools               openssh-client          openssh-server          sudo            systemd                 tar             tree            unzip    w
hich                 xfsprogs           xz && ( cd /lib/systemd/system/sysinit.target.wants/; for i in *; do if [ "$i" != "systemd-tmpfiles-setup.service" ]; then rm -f $
i; fi done ) && rm -f /lib/systemd/system/multi-user.target.wants/* && rm -f /etc/systemd/system/*.wants/* && rm -f /lib/systemd/system/local-fs.target.wants/* && rm -f /
lib/systemd/system/sockets.target.wants/*udev* && rm -f /lib/systemd/system/sockets.target.wants/*initctl* && rm -f /lib/systemd/system/anaconda.target.wants/* && rm -f /
lib/systemd/system/basic.target.wants/* && rm -f /lib/systemd/system/graphical.target.wants/* && ln -vf /lib/systemd/system/multi-user.target /lib/systemd/system/default.
target' returned a non-zero code: 1
adamtheturtle commented 6 years ago

Let's work on getting some more information about what that Docker build error is. I'm glad you could get dcos-docker installed. I will get back to you after investigating how to get a little more information. My first instinct is to try dcos-docker doctor -vvv. I'm not sure whether that will give more information, but can you try it please?

openshiftninja commented 6 years ago
▶ dcos-docker doctor -vvv
Error: no such option: -v

?

adamtheturtle commented 6 years ago

@openshiftninja I think that is an older version:

pip3 install --upgrade git+https://github.com/dcos/dcos-e2e.git@2018.07.30.0
openshiftninja commented 6 years ago

Updated. That doesn't appear to give any additional info.

adamtheturtle commented 6 years ago

Ok again thank you for your perseverance. I will look into how to get more information from Docker about the build failure.

adamtheturtle commented 6 years ago

I have added a dcos-docker doctor check to try to see what the error is. I have done a new release with this.

Please can you upgrade DC/OS E2E and then run the doctor command:

$ pip3 install --upgrade git+https://github.com/dcos/dcos-e2e.git@2018.07.31.0
$ dcos-docker doctor

and then paste the output.

openshiftninja commented 6 years ago

Ok, that's giving much better info. Looks to be a proxy issue, as suspected:

▶ dcos-docker doctor                                                                                                                                           develop 

Note: Docker has approximately 15.5 GB of memory available. The amount of memory required depends on the workload. For example, creating large clusters or multiple clusters requires a lot of memory.
A four node cluster seems to work well on a machine with 9 GB of memory available to Docker.

Error: There was an error building a Docker image. The Docker logs follow.

        Step 1/13 : FROM centos:7
         ---> 49f7960eb7e4
        Step 2/13 : RUN yum install -y          bash-completion                 bind-utils              btrfs-progs             ca-certificates                 curl     gettext          git             iproute                 ipset           iptables                iputils                 libcgroup               libselinux-utils         net-tools                openssh-client          openssh-server          sudo            systemd                 tar             tree            unzip           which    xfsprogs                 xz && ( cd /lib/systemd/system/sysinit.target.wants/; for i in *; do if [ "$i" != "systemd-tmpfiles-setup.service" ]; then rm -f $i; fi done ) && rm -f /lib/systemd/system/multi-user.target.wants/* && rm -f /etc/systemd/system/*.wants/* && rm -f /lib/systemd/system/local-fs.target.wants/* && rm -f /lib/systemd/system/sockets.target.wants/*udev* && rm -f /lib/systemd/system/sockets.target.wants/*initctl* && rm -f /lib/systemd/system/anaconda.target.wants/* && rm -f /lib/systemd/system/basic.target.wants/* && rm -f /lib/systemd/system/graphical.target.wants/* && ln -vf /lib/systemd/system/multi-user.target /lib/systemd/system/default.target
         ---> Running in bc57559a5aab
        Loaded plugins: fastestmirror, ovl
        Determining fastest mirrors

 One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo=<repoid> ...

     4. Disable the repository permanently, so yum won't use it by default. Yum
        will then just ignore the repository until you permanently enable it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>
        or
            subscription-manager repos --disable=<repoid>

     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

Cannot find a valid baseurl for repo: base/7/x86_64
        Could not retrieve mirrorlist http://mirrorlist.centos.org/?release=7&arch=x86_64&repo=os&infra=container error was
14: curl#6 - "Could not resolve host: mirrorlist.centos.org; Unknown error"
        Removing intermediate container bc57559a5aab
openshiftninja commented 6 years ago

so essentially I need to get my proxy configuration inside so that it can properly build that image.

adamtheturtle commented 6 years ago

Can you take a look at https://docs.docker.com/network/proxy/ perhaps?

openshiftninja commented 6 years ago

No, this is a proxy issue for yum, not for the docker daemon. I just need to set the env vars http_proxy/https_proxy by adding ENV commands at the top:

FROM centos:7

ENV http_proxy=http://user:pwd@proxy:port
ENV https_proxy=http://user:pwd@proxy:port
...

I had to do this in a couple of the Dockerfiles and then also set the -k flag on curl (unfortunately, our proxy sends back its own certificates so the verification steps fail).

I've gotten further and am now hitting this error:

2018-07-31 15:34:20 ERROR    dcos_e2e._common | docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
2018-07-31 15:34:20 ERROR    dcos_e2e._common | See 'docker run --help'.
Traceback (most recent call last):
  File "/home/xxxxxxx/venv/dcos-env/bin/dcos-docker", line 11, in <module>
    load_entry_point('DCOS-E2E==2018.7.31.0', 'console_scripts', 'dcos-docker')()
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/cli/dcos_docker/commands/doctor.py", line 453, in doctor
    level = function()
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/cli/dcos_docker/commands/doctor.py", line 392, in _check_can_mount_in_docker
    public_agent.run(args=args)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/dcos_e2e/node.py", line 469, in run
    public_ip_address=self.public_ip_address,
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/dcos_e2e/_node_transports/_docker_exec_transport.py", line 123, in run
    pipe_output=not tty,
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/dcos_e2e/_common.py", line 128, in run_subprocess
    stderr=stderr,
subprocess.CalledProcessError: Command '['docker', 'exec', '--user', 'root', '8662291fe96037bbbdc0ca13dda3d5a6c9a1ac3a0b0d555426a4fb6feab416b7', 'docker', 'run', '-v', '/foo', 'alpine']' returned non-zero exit status 125.
timaa2k commented 6 years ago

Cool, we're getting there! This can most likely be solved by choosing the latest Docker version for DC/OS: dcos-docker create --docker-version 17.12.1-ce Also add your user to the docker group.

openshiftninja commented 6 years ago
▶ dcos-docker create --docker-version 17.12.1-ce
Usage: dcos-docker create [OPTIONS] ARTIFACT

Error: Missing argument "artifact".
timaa2k commented 6 years ago

Of course you need to specify the artifact as usual :) dcos-docker create --docker-version 17.12.1-ce /tmp/dcos_generate_config.sh

openshiftninja commented 6 years ago

I'm not sure what this dcos_generate_config.sh script is... ?

timaa2k commented 6 years ago

Oh ok. That is the DC/OS installation artifact. You usually download the latest version it by issuing: dcos-docker download-artifact

I was jumping ahead because usually you should do dcos-docker doctor. However I suspect it might not work here because you're probably not running Docker as root. Therefore I suggested to try installing DC/OS immediately.

openshiftninja commented 6 years ago

I assume we are talking about this: https://downloads.dcos.io/dcos/stable/dcos_generate_config.sh

timaa2k commented 6 years ago

Exactly! Please try the create command from above and see if it works :)

openshiftninja commented 6 years ago

ugh. dcos-docker download-artifact blows up due to certificate verification errors due to the aforementioned proxy sending its own certificate. Downloading the artifact now via wget.

openshiftninja commented 6 years ago

Doesn't like my wget download. Grabbing through my browser instead.

timaa2k commented 6 years ago

Yeah this seems like a half downloaded artifact file. At least it knows how to contact the Docker daemon now :)

openshiftninja commented 6 years ago

to be continued Wednesday. :)

openshiftninja commented 6 years ago

This looks promising:

▶ dcos-docker create --docker-version 17.12.1-ce /home/nbkdcwl/Downloads/dcos_generate_config.sh                                                                     2 
default
Cluster "default" has started. Run "dcos-docker wait --cluster-id default" to wait for DC/OS to become ready.
▶ dcos-docker wait --cluster-id default
A cluster may take some time to be ready.
The amount of time it takes to start a cluster depends on a variety of factors.
If you are concerned that this is hanging, try "dcos-docker doctor" to diagnose common issues.
If you cancel this command while it is running, you may not be able to log in. To resolve that, run this command again.
adamtheturtle commented 6 years ago

Yes it does!

If that works let's still know that for you:

Let us know how the usage goes and then we can get back to the above.

openshiftninja commented 6 years ago

I did have to hack the proxy in to the Dockerfiles that dcos-docker is using, and linuxbrew still broken for me. It appears I need to configure dc-os with the proxies now, because I can't authenticate to log in (must be connected to the internet). Plus dcos-docker doctor is now spouting some new errors:

▶ dcos-docker doctor                                                                                                                                              Traceback (most recent call last):
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/api/client.py", line 229, in _raise_for_status
    response.raise_for_status()
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/requests/models.py", line 939, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.38/containers/create

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/models/containers.py", line 766, in run
    detach=detach, **kwargs)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/models/containers.py", line 824, in create
    resp = self.client.api.create_container(**create_kwargs)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/api/container.py", line 411, in create_container
    return self.create_container_from_config(config, name)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/api/container.py", line 422, in create_container_from_config
    return self._result(res, True)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/api/client.py", line 235, in _result
    self._raise_for_status(response)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/api/client.py", line 231, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.ImageNotFound: 404 Client Error: Not Found ("No such image: luca3m/sleep:latest")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/api/client.py", line 229, in _raise_for_status
    response.raise_for_status()
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/requests/models.py", line 939, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.38/images/create?fromImage=luca3m%2Fsleep

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxxxxxx/venv/dcos-env/bin/dcos-docker", line 11, in <module>
    load_entry_point('DCOS-E2E==2018.7.31.0', 'console_scripts', 'dcos-docker')()
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/cli/dcos_docker/commands/doctor.py", line 453, in doctor
    level = function()
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/cli/dcos_docker/commands/doctor.py", line 70, in _check_docker_root_free_space
    privileged=True,
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/models/containers.py", line 768, in run
    self.client.images.pull(image, platform=platform)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/models/images.py", line 412, in pull
    self.client.api.pull(repository, tag=tag, **kwargs)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/api/image.py", line 399, in pull
    self._raise_for_status(response)
  File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/api/client.py", line 231, in _raise_for_status
  raise create_api_error_from_http_exception(e)
File "/home/xxxxxxx/venv/dcos-env/lib64/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
  raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 500 Server Error: Internal Server Error ("Get https://registry-1.docker.io/v2/: Proxy Authentication Required")

Once I configure the proxy, the bottom error should go away, but the others I'm not sure about.

adamtheturtle commented 6 years ago

I did have to hack the proxy in to the Dockerfiles that dcos-docker is using

I will make an issue for getting an interface for this one.

linuxbrew still broken for me

I will make an issue for narrowing this down and may ask for data from you.

Once I configure the proxy, the bottom error should go away, but the others I'm not sure about.

Ok, let's get that done and then we can see where we are.

Thank you!

openshiftninja commented 6 years ago

I'm booked on finishing something up for work this week, but I'll try to squeeze in a fix Friday and see if it makes DC/OS work. Then we can continue on the other stuff.

openshiftninja commented 6 years ago

So I put an environment.proxy file under /usr/lib/dcos using the Dockerfile that builds the base image, and it shows up in the three containers when I launch the cluster (master, slave, public agent). I can actually validate that it can reach outside because when I try to authenticate with Microsoft, it actually prompts me with a number that I validate with my Authenticator app. Still doesn't let me log in though.

Pops up a generic "Unable to login to your DC/OS cluster. ..."

Not sure where to look in the containers to find more detailed error messages, but looking now.

openshiftninja commented 6 years ago

hmm... tailing everything that has a .log extension in all three containers (except for the replication logs) and not seeing any messages that are giving me any hints.

openshiftninja commented 6 years ago

I do see, however, that there is a request to login?_timestamp=xxxx that has a 500 rc. There should be a log somewhere to tell me why... :)

openshiftninja commented 6 years ago

The only thing I see is some errors in the crash.log on the master:

2018-08-03 16:47:27 =ERROR REPORT====
Resolving "master1.mesos":2181 meet an error: nxdomain
2018-08-03 16:47:27 =ERROR REPORT====
Resolving "master2.mesos":2181 meet an error: nxdomain
2018-08-03 16:47:27 =ERROR REPORT====
Resolving "master3.mesos":2181 meet an error: nxdomain
adamtheturtle commented 6 years ago

@openshiftninja Thank you for looking into this. What is the exact error message you get in the UI?

openshiftninja commented 6 years ago

Just unable to login.

unable_to_login

timaa2k commented 6 years ago

So this seems to be a DNS issue. dcos-docker configures DC/OS by default with the Google DNS server 8.8.8.8 which may you be restricted from reaching by you environment/provider. Please refer to the steps described in this other open issue to troubleshoot the symptoms: https://github.com/dcos/dcos-e2e/issues/1253

openshiftninja commented 6 years ago

Interesting. I tried github and Microsoft (as above, I actually was able to get a prompt for matching a number to my Authenticator app, opened my phone, hit the right number, but still not able to log in).

timaa2k commented 6 years ago

The OAuth authentication flow is happening locally in your browser with your local connection and DNS resolver from your laptop. However when you're authenticated through let's say Microsoft and want to interact with the DC/OS cluster (e.g. logging in) the containers must be connected properly to the internet and be able to reach the configured DNS servers.

openshiftninja commented 6 years ago

ok, if I disable oauth authentication, I'm able to see my cluster. I was even able to get the dcos command, but the dcos cluster setup command fails thinking I need to authenticate:

▶ dcos cluster setup http://172.17.0.2
Authentication failed. Please run `dcos cluster setup <dcos_url>`

I tried adding a resolver and turning on oauth, but I basically then can't log in.

resolvers: ['xx.xx.61.2']
oauth_enabled: true
openshiftninja commented 6 years ago

Ok, at this point, I'm really digging in the weeds. We have a DC/OS lab installed, and I can hook my dcos command line tool up to it, so I don't see any reason to keep hammering trying to make this work. I think there are too many environment-specific issues for me to keep you guys occupied with it. I'll open another issue if I run into issues working with that lab environment.