PureStorage-OpenConnect / purestorage-flocker-driver

Please use Pure Service Orchestrator instead of Pure Storage Flocker Driver. No ongoing development or support.
http://www.purestorage.com/containers
Apache License 2.0
1 stars 4 forks source link

/etc/flocker/agent.yml file sample #1

Closed akamalov closed 8 years ago

akamalov commented 8 years ago

Greetings,

Thanks so much for writing this driver. I just want to make sure that my agent.yml file makes sense. I would appreciate if you'd be able to post a sample agent.yml file which I can model after. Here is mine, and if you could please verify if settings are correct?

File: /etc/flocker/agent.yml

"version": 1
"control-service": 
  "hostname": 192.168.120.156
  "port": 4523
"dataset": 
    "backend": "purestorage_flasharray_flocker_driver"
    "pure_ip": 172.16.128.157 
    "pure_api_token": "50cac451-fefe-b635-9692-5870aada9c49"
    "pure_storage_protocol": "ISCSI"
    "pure_chap_host_user": "server1"
    "pure_chap_host_password": "pureBXPserver1"

Trying to test pure storage connectivity:

[root@server1 flocker]# trial tests.test_purestorage
tests
  test_purestorage ...                                                  [ERROR]

===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/twisted/trial/runner.py", line 644, in loadByNames
    things.append(self.findByName(name))
  File "/usr/lib64/python2.7/site-packages/twisted/trial/runner.py", line 454, in findByName
    return reflect.namedAny(name)
  File "/usr/lib64/python2.7/site-packages/twisted/python/reflect.py", line 513, in namedAny
    raise ObjectNotFound('%r does not name an object' % (name,))
twisted.python.reflect.ObjectNotFound: 'tests.test_purestorage' does not name an object

tests.test_purestorage
-------------------------------------------------------------------------------
Ran 1 tests in 0.249s

FAILED (errors=1)
[root@server1 flocker]# 

Thanks again!

patrick-east commented 8 years ago

Hi,

That config looks fine (assuming the ip and api-token are good).

That error is because the test suite didn't find the test object "tests.test_purestorage". Can you verify what directory the tests are being run from?

I believe you will need to run them from the root of the git repository. I don't think that the unit tests are installed as part of the python package, they are primarily for development purposes so if you followed the instructions to "pip install" and you want to run them you would need to clone this repo to get the test code.

If the directory thing doesn't help get past this please post some more info about your setup so we can try and reproduce the issue. Things like OS, installation steps, etc.

Thanks!

akamalov commented 8 years ago

Hi Patrick,

Thanks again for replying so quickly. The reason I was asking about the agent.yml is because a token provided to us contains "dashes" (i.e., "-"). Thus, I was wondering if the provided token should be used "as is" or should we be removing dashes. Our flocker-dataset-agent.service is exiting, so I am assuming that it is probably due to misconfigured agent.yml file. The following is an excerpt from /var/log/messages:

parse_block_mapping_key\n    if self.check_token(KeyToken):\n  File \"/opt/flocker/lib/python2.7/site-packages/yaml/scanner.py\", line 115, in check_token\n    while self.need_more_tokens():\n  File \"/opt/flocker/lib/python2.7/site-packages/yaml/scanner.py\", line 149, in need_more_tokens\n    self.stale_possible_simple_keys()\n  File \"/opt/flocker/lib/python2.7/site-packages/yaml/scanner.py\", line 289, in stale_possible_simple_keys\n    \"could not found expected ':'\", self.get_mark())\nyaml.scanner.ScannerError: while scanning a simple key\n  in \"<string>\", line 5, column 5:\n        pure_ip:172.16.128.157 \n        ^\ncould not found expected ':'\n  in \"<string>\", line 6, column 5:\n        pure_api_token: 50cac451-fefe-b6 ... \n        ^\n", "message_type": "twisted:log", "task_level": [1]}
Aug 12 08:27:50 mslave5 flocker-docker-plugin: {"task_uuid": "32660358-ee04-45d9-9357-eb034dd7d077", "error": true, "timestamp": 1471004870.809736, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/dockerplugin/_script.py\", line 93, in docker_plugin_main\n    options=DockerPluginOptions()).main()\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 295, in main\n    self._react(run_and_log, [], _reactor=self._reactor)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/task.py\", line 

The string: \"could not found expected ':'\", self.get_mark())\nyaml.scanner.ScannerError: while scanning a simple key\n in \"<string>\", line 5, column 5:\n pure_ip:172.16.128.157 \n ^\ncould not found expected ':'\n in \"<string>\", line 6, column 5:\n pure_api_token: 50cac451-fefe-b6 ... \n ^\n", "message_type": "twisted:log", makes a bit worried about a correct syntax of the /etc/flocker/agent.yml file.

Here is the environment information.

OS:

NAME="Red Hat Enterprise Linux Server"
VERSION="7.2 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.2"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.2 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.2:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.2
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.2"

ClusterHQ version

clusterhq-python-flocker-1.13.0-1.x86_64
clusterhq-flocker-docker-plugin-1.13.0-1.noarch
clusterhq-flocker-node-1.13.0-1.noarch

Python version:

[root@server1 purestorage-flocker-driver]# python --version
Python 3.4.2

###################################################### List current directory (inside git-cloned directory):

[root@server1 purestorage-flocker-driver]# ls -al
total 120
drwxr-xr-x   6 root root  4096 Aug 12 13:43 .
dr-xr-x---. 11 root root  4096 Aug 12 13:44 ..
-rw-r--r--   1 root root   411 Jun 29 13:23 AUTHORS.rst
-rw-r--r--   1 root root   855 Jun 29 13:23 DESCRIPTION.rst
drwxr-xr-x   8 root root   152 Jun 29 13:23 .git
-rw-r--r--   1 root root    17 Jun 29 13:23 .gitignore
-rw-r--r--   1 root root    69 Jun 29 13:23 __init__.py
-rw-r--r--   1 root root 11357 Jun 29 13:23 LICENSE
drwxr-xr-x   3 root root    75 Aug 12 13:43 purestorage_flasharray_flocker_driver
-rw-r--r--   1 root root 62030 Jun 29 13:23 PureStorageFlocker.png
-rw-r--r--   1 root root  7011 Jun 29 13:23 README.md
-rw-r--r--   1 root root    34 Jun 29 13:23 requirements.txt
-rw-r--r--   1 root root   751 Jun 29 13:23 setup.cfg
-rw-r--r--   1 root root   171 Jun 29 13:23 setup.py
drwxr-xr-x   4 root root   126 Aug 12 13:43 tests
drwxr-xr-x   2 root root    41 Aug 12 13:43 _trial_temp
[root@server1 purestorage-flocker-driver]# 

Install the driver:

[root@server1 purestorage-flocker-driver]# python setup.py install
running install
[pbr] Writing ChangeLog
[pbr] Generating ChangeLog
[pbr] ChangeLog complete (0.0s)
[pbr] Generating AUTHORS
[pbr] AUTHORS complete (0.0s)
running build
running build_py
creating build
creating build/lib
creating build/lib/purestorage_flasharray_flocker_driver
copying purestorage_flasharray_flocker_driver/__init__.py -> build/lib/purestorage_flasharray_flocker_driver
copying purestorage_flasharray_flocker_driver/purestorage_blockdevice.py -> build/lib/purestorage_flasharray_flocker_driver
running egg_info
creating purestorage_flocker_driver.egg-info
writing requirements to purestorage_flocker_driver.egg-info/requires.txt
writing top-level names to purestorage_flocker_driver.egg-info/top_level.txt
writing dependency_links to purestorage_flocker_driver.egg-info/dependency_links.txt
writing purestorage_flocker_driver.egg-info/PKG-INFO
writing pbr to purestorage_flocker_driver.egg-info/pbr.json
[pbr] Processing SOURCES.txt
writing manifest file 'purestorage_flocker_driver.egg-info/SOURCES.txt'
[pbr] In git context, generating filelist from git
warning: no previously-included files found matching '.gitreview'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
writing manifest file 'purestorage_flocker_driver.egg-info/SOURCES.txt'
running install_lib
running install_egg_info
removing '/opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/purestorage_flocker_driver-1.0.0-py3.4.egg-info' (and everything under it)
Copying purestorage_flocker_driver.egg-info to /opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/purestorage_flocker_driver-1.0.0-py3.4.egg-info
running install_scripts
[root@server1 purestorage-flocker-driver]# 

From within this directory, go ahead and test the driver:

[root@mslave5 purestorage-flocker-driver]# trial tests.test_purestorage
tests
  test_purestorage ...                                                  [ERROR]

===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/twisted/trial/runner.py", line 644, in loadByNames
    things.append(self.findByName(name))
  File "/usr/lib64/python2.7/site-packages/twisted/trial/runner.py", line 454, in findByName
    return reflect.namedAny(name)
  File "/usr/lib64/python2.7/site-packages/twisted/python/reflect.py", line 506, in namedAny
    topLevelPackage = _importAndCheckStack(trialname)
  File "/root/purestorage-flocker-driver/tests/test_purestorage.py", line 16, in <module>
    from flocker.node.agents.test.test_blockdevice import make_iblockdeviceapi_tests
exceptions.ImportError: No module named flocker.node.agents.test.test_blockdevice

tests.test_purestorage
-------------------------------------------------------------------------------
Ran 1 tests in 0.026s

FAILED (errors=1)
[root@mslave5 purestorage-flocker-driver]# 
patrick-east commented 8 years ago

Hmm, interesting. In some of the test/demo ones I've used I have not run into any problems with the api-token looking like that. One example agent.yml has the following in it:

 dataset:
     backend: purestorage_flasharray_flocker_driver
     pure_ip: cinder-fa1.dev.purestorage.com
     pure_api_token: 661f9687-0b1e07b0d-e07d-1e776d50f9eb

Which was parsed just fine. I don't think the double quotes should affect anything, but may be worth giving it a shot without them.

For the issue on the unit tests that you are seeing now it is a matter of the test code not being able to import the flocker test code. You may need to source the flocker virtual environment and/or ensure that the flocker packages have the test code installed with them.

Sort of thinking out loud, we probably should have an installation/setup section for development setups where tests can be run along side the installation steps for just using the driver. As we're seeing first-hand, they are not definitely not the same ;)

akamalov commented 8 years ago

Thanks for getting back, Patrick. Unfortunately, it is still no go. Here is the log output:

Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"task_uuid": "f53ffcf7-80fa-4311-9a83-71ea8c4c970a", "error": false, "timestamp": 1471031542.751597, "message": "AgentAMP connection established (HOST:IPv4Address(TCP, '192.168.120.165', 54825) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x36ba150>", "fsm_input": "<ClusterStatusInputs=CONNECTED_TO_CONTROL_SERVICE>", "timestamp": 1471031542.753815, "fsm_rich_input": "<_ConnectedToControlService>", "action_status": "started", "task_uuid": "c603466e-7564-4912-bc14-a33d6c0d3e81", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [1]}
Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"fsm_next_state": "<ClusterStatusStates=IGNORANT>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471031542.754964, "fsm_output": ["<ClusterStatusOutputs=STORE_CLIENT>"], "task_uuid": "c603466e-7564-4912-bc14-a33d6c0d3e81", "action_status": "succeeded"}
Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"task_uuid": "274a8cff-f220-4045-85e5-ca990190417a", "error": false, "timestamp": 1471031542.843493, "message": "AgentAMP connection lost (HOST:IPv4Address(TCP, '192.168.120.165', 54825) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471031542.844914, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "8b1c446a-2b0c-46b2-96ae-73d6d9f0df6f", "task_level": [1]}
Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x36ba150>", "fsm_input": "<ClusterStatusInputs=DISCONNECTED_FROM_CONTROL_SERVICE>", "timestamp": 1471031542.846824, "fsm_rich_input": null, "action_status": "started", "task_uuid": "a8076e79-de67-44ab-a554-5e53aa251206", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=IGNORANT>", "task_level": [1]}
Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"fsm_next_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471031542.847824, "fsm_output": [], "task_uuid": "a8076e79-de67-44ab-a554-5e53aa251206", "action_status": "succeeded"}
Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"task_uuid": "4ee23f71-8fd5-49f5-ae77-654e626d39ed", "error": false, "timestamp": 1471031542.849358, "message": "<twisted.internet.tcp.Connector instance at 0x2302830> will retry in 2 seconds", "message_type": "twisted:log", "task_level": [1]}
Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"task_uuid": "6d725d77-6242-43a5-9bba-e4f3d44370fc", "error": false, "timestamp": 1471031542.850796, "message": "Stopping factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x36bcc20>", "message_type": "twisted:log", "task_level": [1]}

Aug 12 15:52:24 server1 flocker-container-agent[17482]: {"task_uuid": "7b6df3c3-8f82-4d4d-9292-52e1171569ca", "error": true, "timestamp": 1471031544.768835, "message": "Unhandled Error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 295, in main\n    self._react(run_and_log, [], _reactor=self._reactor)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/task.py\", line 936, in react\n    _reactor.run()\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/base.py\", line 1194, in run\n    self.mainLoop()\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/base.py\", line 1203, in mainLoop\n    self.runUntilCurrent()\n--- <exception caught here> ---\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/base.py\", line 825, in runUntilCurrent\n    call.func(*call.args, **call.kw)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/control/_protocol.py\", line 455, in <lambda>\n    lambda: protocol.transport.abortConnection())\nexceptions.AttributeError: 'NoneType' object has no attribute 'abortConnection'\n", "message_type": "twisted:log", "task_level": [1]}

#########################

-- Logs begin at Mon 2016-08-01 21:08:10 EDT, end at Fri 2016-08-12 15:55:56 EDT. --
Aug 12 15:52:27 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 12 15:52:27 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 12 15:52:27 server1 systemd[1]: Failed to start Flocker Dataset Agent.
Aug 12 15:52:27 server1 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 12 15:52:27 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 12 15:52:26 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 12 15:52:26 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 12 15:52:26 server1 systemd[1]: flocker-dataset-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 12 15:52:26 server1 flocker-dataset-agent[58950]: {"task_uuid": "1da0466a-78ad-4f73-8c5b-938fcf2b0a71", "error": false, "timestamp": 1471031546.783778, "message": "Main loop terminated.", "message_type": "twisted:log", "task_level": [1]}
Aug 12 15:52:26 server1 flocker-dataset-agent[58950]: {"task_uuid": "eb4d39ed-40b4-48cb-906d-c31df52d53b2", "error": true, "timestamp": 1471031546.78308, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/flocker
Aug 12 15:52:26 server1 flocker-dataset-agent[58950]: {"task_uuid": "9cd4fc66-fe8e-43ac-a128-b6de9dfd26e9", "error": true, "timestamp": 1471031546.781752, "message": "Unhandled Error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/
Aug 12 15:52:26 server1 flocker-dataset-agent[58950]: {"task_uuid": "e72d253a-8900-415e-8361-4746a739bdae", "error": false, "timestamp": 1471031546.771021, "message": "Log opened.", "message_type": "twisted:log", "task_level": [1]}
Aug 12 15:52:25 server1 systemd[1]: Starting Flocker Dataset Agent...

Wondering, is SSL required ? Here is the string that makes me wonder...

Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471031542.844914, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')

/var/log/messages:

Aug 12 16:13:21 server1 flocker-container-agent[23765]: {"task_uuid": "d794a869-ff22-4a95-a0a1-12cbef7b19ab", "error": false, "timestamp": 1471032801.275129, "message": "AgentAMP connection lost (HOST:IPv4Address(TCP, '192.168.120.165', 34282) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 12 16:13:21 server1 flocker-container-agent[23765]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471032801.276636, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "0e2c730e-5910-47ff-a78e-ea63a8001426", "task_level": [1]}
Aug 12 16:13:21 server1 flocker-container-agent[23765]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x4ab8110>", "fsm_input": "<ClusterStatusInputs=DISCONNECTED_FROM_CONTROL_SERVICE>", "timestamp": 1471032801.277992, "fsm_rich_input": null, "action_status": "started", "task_uuid": "e080421a-8c35-4711-88ec-52bfd8410f68", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=IGNORANT>", "task_level": [1]}
patrick-east commented 8 years ago

That SSL error looks like a problem with the certs used by Flocker to communicate with its various services. Is this happening before or after the parsing error? Or is the parsing error on the agent.yml still a thing?

To make sure we're on the same page, SSL is always used for Flocker <-> Flocker connections and requires those certificates to be configured correctly. It is also used for the HTTPS requests from the Pure driver <-> Pure FlashArray, but should not force certificate validation unless the config option is set. IIRC we would see an error coming from the Pure driver if that were the case, so it appears to be from the Flocker <-> Flocker connection between the agent and control service.

akamalov commented 8 years ago

It looks like it is parsing it agent.yml file because if you see, the entries are from parsed file. And, if I try to curl up to http:// I get 301 error - 301 Moved Permanently, however curling to https:// is giving me a full output... (the way it should be). How does Flocker driver access Pure's API ? Is there any way i can try to access using https:// by providing a token id or do I really have to have a cert ?

patrick-east commented 8 years ago

The driver uses HTTPS, but it doesn't really look like that error is coming from trying to connect to the Purity REST API.

Check out the stack trace from this:

Aug 12 15:52:22 server1 flocker-container-agent[17482]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471031542.844914, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "8b1c446a-2b0c-46b2-96ae-73d6d9f0df6f", "task_level": [1]}
/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite
/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead
/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived
/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived
--- <exception caught here> ---
/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write
/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send
/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error
/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue

That is all from the twisted TCP connections which are used by Flocker to communicate with other Flocker services (not the Pure driver).

You can validate that the driver will be able to make requests by doing something like the following (assuming you've sourced any required virtual environments and substituting in your IP address for 'cinder-fa1' and api token).

 ~/ # python
Python 2.7.11 (default, Jan 22 2016, 08:29:18)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import purestorage
>>> fa = purestorage.FlashArray('cinder-fa1', api_token='661f9687-0b1e07b0d-e07d-1e776d50f9eb')
/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
>>> fa.get()
/usr/local/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
{u'id': u'9a738ed7-8583-fd12-f03f-3d6d55bf9b76', u'version': u'4.5.16', u'array_name': u'cinder-fa1', u'revision': u'201604020210+4dd8c4b'}

In that example you can see that it is getting warnings about the self-signed certificates on the array, but is not failing. It's the same with the Flocker driver unless you set pure_verify_https to true in your agent.yml

Not to go too far off in the weeds, but for the Purity REST API it always uses a temporary session cookie which is created by authenticating via the api token. All of this traffic occurs via HTTPS requests, but does not force using a certificate.

Flocker does use certificates for its own internal communication and does force them to be valid. It's part of doing the Flocker deployment https://docs.clusterhq.com/en/latest/docker-integration/configuring-authentication.html

akamalov commented 8 years ago

Hey Patrick,

Thanks for the prompt reply back. Yes, it did work as you said:

[root@server1 purestorage-flocker-driver]# python
Python 3.4.2 (default, Apr 19 2016, 08:30:31) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import purestorage
>>> fa=purestorage.FlashArray('172.16.128.157',api_token='50cac451-fefe-b635-9692-5870aada9c49')
/opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
/opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
>>> fa.get()
/opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
{'revision': '201605132143+b186ed4', 'id': '7e4d75d0-ffc1-4e7c-880a-c044849e1993', 'version': '4.6.8', 'array_name': 'PURTPC0027'}
>>> 

Having said that, I am still not sure what is causing flocker-dataset-agent.service failures. So far, with your help I was able to verify that my token is correct, that an array is reachable and a driver is working. However, I still cannot get that flocker-dataset-agent service to stay up. I am getting continuous errors, such as:

[root@server1 purestorage-flocker-driver]# journalctl -u flocker-dataset-agent.service -l -r
-- Logs begin at Sat 2016-08-13 21:25:58 EDT, end at Mon 2016-08-15 06:40:08 EDT. --
Aug 15 06:32:18 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 15 06:32:18 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 15 06:32:18 server1 systemd[1]: Failed to start Flocker Dataset Agent.
Aug 15 06:32:18 server1 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 15 06:32:18 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 15 06:32:17 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 15 06:32:17 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 15 06:32:17 server1 systemd[1]: flocker-dataset-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 15 06:32:17 server1 flocker-dataset-agent[8611]: {"task_uuid": "8541ea57-1453-4f31-8c82-e1864dadbde2", "error": false, "timestamp": 1471257137.791643, "message": "Main loop terminated.", "message_type": "twisted:log", "task_level": [1]}
Aug 15 06:32:17 server1 flocker-dataset-agent[8611]: {"task_uuid": "947f5282-172d-4609-8ff4-807347b76ee8", "error": true, "timestamp": 1471257137.791292, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/flocker
Aug 15 06:32:17 server1 flocker-dataset-agent[8611]: {"task_uuid": "e51c236d-71c5-4d1b-8065-14606bb0e1f2", "error": true, "timestamp": 1471257137.790625, "message": "Unhandled Error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/s
Aug 15 06:32:17 server1 flocker-dataset-agent[8611]: {"task_uuid": "e9cb0fbe-ec9c-45b4-84ca-b3c5eb86dab8", "error": false, "timestamp": 1471257137.780331, "message": "Log opened.", "message_type": "twisted:log", "task_level": [1]}
Aug 15 06:32:16 server1 systemd[1]: Starting Flocker Dataset Agent...
Aug 15 06:32:16 server1 systemd[1]: Started Flocker Dataset Agent.
Aug 15 06:32:16 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 15 06:32:16 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 15 06:32:16 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 15 06:32:16 server1 systemd[1]: flocker-dataset-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 15 06:32:16 server1 flocker-dataset-agent[8592]: {"task_uuid": "34f3dc1a-bb2b-4b5b-b95a-7c8e4c75d896", "error": false, "timestamp": 1471257136.187543, "message": "Main loop terminated.", "message_type": "twisted:log", "task_level": [1]}
Aug 15 06:32:16 server1 flocker-dataset-agent[8592]: {"task_uuid": "c68208b0-9413-47b7-b888-0ca953e07825", "error": true, "timestamp": 1471257136.18665, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/flocker/
Aug 15 06:32:16 server1 flocker-dataset-agent[8592]: {"task_uuid": "8e03708d-e037-4aca-aab1-70f65d7c3a4c", "error": true, "timestamp": 1471257136.185059, "message": "Unhandled Error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/s
Aug 15 06:32:16 server1 flocker-dataset-agent[8592]: {"task_uuid": "a0c252bf-07c8-4784-bd15-f2dc36bd66eb", "error": false, "timestamp": 1471257136.169579, "message": "Log opened.", "message_type": "twisted:log", "task_level": [1]}

Since dataset-agent.service is not coming up, this means that nodes won't go register with FCS:

[root@mmaster1 ~]# flockerctl --control-service=192.168.120.156 list-nodes
SERVER   ADDRESS 

As you can see, an empty list... Any pointers or suggestions where should I be looking into ?

patrick-east commented 8 years ago

I reached out to our dev contacts at ClusterHQ, hopefully they can help point us in the right direction.

I would double check that the setup for the CA and SSL certificates for the agent and controller is correct.

wallnerryan commented 8 years ago

@patrick-east i responded over here. https://github.com/ClusterHQ/flocker/issues/2885

akamalov commented 8 years ago

@patrick-east would you please take a look at Pure driver for Flocker module dependencies we're having at https://github.com/ClusterHQ/flocker/issues/2885 ?

patrick-east commented 8 years ago

Updated README to have a sample config file, closing this issue. Discussions for current issues are happening over at https://github.com/PureStorage-OpenConnect/purestorage-flocker-driver/issues/3