ClusterHQ / flocker

Container data volume manager for your Dockerized application
https://clusterhq.com
Apache License 2.0
3.39k stars 290 forks source link

Unit flocker-dataset-agent.service entered failed state. #2885

Closed akamalov closed 8 years ago

akamalov commented 8 years ago

Greetings,

I am having a problem with trying to start flocker-dataset-agent service. Brief background: the environment consists of RHEL7.2 attempting to access PureStorage backend via iSCSI. PureStorage drivers have been installed and test (which I will go further below). Let me first display environmentals:

Operating System:

NAME="Red Hat Enterprise Linux Server"
VERSION="7.2 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="7.2"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.2 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.2:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.2
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.2"

Flocker environment:

clusterhq-python-flocker-1.13.0-1.x86_64
clusterhq-flocker-docker-plugin-1.13.0-1.noarch
clusterhq-flocker-node-1.13.0-1.noarch

/etc/flocker/agent.yml

"version": 1
"control-service": 
  "hostname": 192.168.120.156 
  "port": 4523
"dataset": 
    "backend": "purestorage_flasharray_flocker_driver"
    "pure_ip": 192.168.128.157 
    "pure_api_token": "50cac451-fefe-b635-9692-5870aada9c49"
    "pure_chap_host_user": "server15"
    "pure_chap_host_password": "XXXXXX"

PureStorage Driver:

PureStorage Driver GitHub link - https://github.com/PureStorage-OpenConnect/purestorage-flocker-driver

Restart Flocker services, display current status:

##########################
● flocker-container-agent.service - Flocker Container Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-container-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-15 08:58:04 EDT; 5min ago
 Main PID: 29760 (flocker-contain)
   Memory: 63.4M
   CGroup: /system.slice/flocker-container-agent.service
           └─29760 /opt/flocker/bin/python /usr/sbin/flocker-container-agent --journald

Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"task_uuid": "f1cb436f-5845-4152-b500-3bc59c636f4b", "error": false, "timestamp": 1471266234.010053, "message": "AgentAMP connection established (HOST:IPv4Address(TCP, '192.168.120.165', 52992) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x45f4110>", "fsm_input": "<ClusterStatusInputs=CONNECTED_TO_CONTROL_SERVICE>", "timestamp": 1471266234.01245, "fsm_rich_input": "<_ConnectedToControlService>", "action_status": "started", "task_uuid": "789f2609-d355-49d7-9e76-4cfb1db67acc", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [1]}
Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"fsm_next_state": "<ClusterStatusStates=IGNORANT>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471266234.013306, "fsm_output": ["<ClusterStatusOutputs=STORE_CLIENT>"], "task_uuid": "789f2609-d355-49d7-9e76-4cfb1db67acc", "action_status": "succeeded"}
Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"task_uuid": "d11cc085-6f78-4523-9246-2149a23ae968", "error": false, "timestamp": 1471266234.116931, "message": "AgentAMP connection lost (HOST:IPv4Address(TCP, '192.168.120.165', 52992) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471266234.118531, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "a49e6ac7-0aa4-43b4-a129-111b66af8f31", "task_level": [1]}
Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x45f4110>", "fsm_input": "<ClusterStatusInputs=DISCONNECTED_FROM_CONTROL_SERVICE>", "timestamp": 1471266234.12051, "fsm_rich_input": null, "action_status": "started", "task_uuid": "b670b5d0-86d3-4899-9f09-89bc2a8b0fd0", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=IGNORANT>", "task_level": [1]}
Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"fsm_next_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471266234.121409, "fsm_output": [], "task_uuid": "b670b5d0-86d3-4899-9f09-89bc2a8b0fd0", "action_status": "succeeded"}
Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"task_uuid": "6793744e-5b23-434b-a18e-2752789c0acc", "error": false, "timestamp": 1471266234.122539, "message": "<twisted.internet.tcp.Connector instance at 0x45ef7e8> will retry in 2 seconds", "message_type": "twisted:log", "task_level": [1]}
Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"task_uuid": "31b1fe22-151f-419c-9b2f-7dce6c19772f", "error": false, "timestamp": 1471266234.123821, "message": "Stopping factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x45f6518>", "message_type": "twisted:log", "task_level": [1]}
Aug 15 09:03:54 server15 flocker-container-agent[29760]: {"task_uuid": "6d217799-8e1f-4b2f-8319-5ce55e26b5f0", "error": true, "timestamp": 1471266234.176803, "message": "Unhandled Error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 295, in main\n    self._react(run_and_log, [], _reactor=self._reactor)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/task.py\", line 936, in react\n    _reactor.run()\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/base.py\", line 1194, in run\n    self.mainLoop()\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/base.py\", line 1203, in mainLoop\n    self.runUntilCurrent()\n--- <exception caught here> ---\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/base.py\", line 825, in runUntilCurrent\n    call.func(*call.args, **call.kw)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/control/_protocol.py\", line 455, in <lambda>\n    lambda: protocol.transport.abortConnection())\nexceptions.AttributeError: 'NoneType' object has no attribute 'abortConnection'\n", "message_type": "twisted:log", "task_level": [1]}

##############################

● flocker-dataset-agent.service - Flocker Dataset Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-dataset-agent.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Mon 2016-08-15 08:58:11 EDT; 5min ago
  Process: 29869 ExecStart=/usr/sbin/flocker-dataset-agent --journald (code=exited, status=1/FAILURE)
 Main PID: 29869 (code=exited, status=1/FAILURE)

Aug 15 08:58:11 server15 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 15 08:58:11 server15 systemd[1]: flocker-dataset-agent.service failed.
Aug 15 08:58:11 server15 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 15 08:58:11 server15 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 15 08:58:11 server15 systemd[1]: Failed to start Flocker Dataset Agent.
Aug 15 08:58:11 server15 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 15 08:58:11 server15 systemd[1]: flocker-dataset-agent.service failed.

##############################

● flocker-docker-plugin.service - Flocker Docker Plugin
   Loaded: loaded (/usr/lib/systemd/system/flocker-docker-plugin.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-15 08:58:04 EDT; 5min ago
 Main PID: 29781 (flocker-docker-)
   Memory: 62.4M
   CGroup: /system.slice/flocker-docker-plugin.service
           └─29781 /opt/flocker/bin/python /usr/sbin/flocker-docker-plugin --journald

Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"exception": "flocker.apiclient._client.NotFound", "task_level": [1592, 4], "action_type": "flocker:apiclient:http_request", "reason": "{\"description\": \"No node found with given era.\"}", "timestamp": 1471266234.313487, "task_uuid": "2dd0b28b-cdc9-434a-9dff-a592c304b836", "action_status": "failed"}
Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"task_uuid": "c731df94-e1a3-4d88-a911-c3c1d85d971e", "error": false, "timestamp": 1471266234.316437, "message": "Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x5b134d0>", "message_type": "twisted:log", "task_level": [1]}
Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"request_body": null, "url": "https://mmaster1:4523/v1/state/nodes/by_era/61181a3e-978a-4e8c-a2b1-7f5e4ae1bf54", "timestamp": 1471266234.415257, "action_status": "started", "task_uuid": "2dd0b28b-cdc9-434a-9dff-a592c304b836", "action_type": "flocker:apiclient:http_request", "method": "GET", "task_level": [1593, 1]}
Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"task_uuid": "2dd0b28b-cdc9-434a-9dff-a592c304b836", "error": false, "timestamp": 1471266234.428725, "message": "Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x5a2e128>", "message_type": "twisted:log", "task_level": [1593, 3]}
Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"exception": "flocker.apiclient._client.NotFound", "task_level": [1593, 4], "action_type": "flocker:apiclient:http_request", "reason": "{\"description\": \"No node found with given era.\"}", "timestamp": 1471266234.531891, "task_uuid": "2dd0b28b-cdc9-434a-9dff-a592c304b836", "action_status": "failed"}
Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"task_uuid": "37ffc3fe-c4ee-4bf0-b30a-ef0b1a900942", "error": false, "timestamp": 1471266234.534984, "message": "Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x5a2e128>", "message_type": "twisted:log", "task_level": [1]}
Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"request_body": null, "url": "https://mmaster1:4523/v1/state/nodes/by_era/61181a3e-978a-4e8c-a2b1-7f5e4ae1bf54", "timestamp": 1471266234.633406, "action_status": "started", "task_uuid": "2dd0b28b-cdc9-434a-9dff-a592c304b836", "action_type": "flocker:apiclient:http_request", "method": "GET", "task_level": [1594, 1]}
Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"task_uuid": "2dd0b28b-cdc9-434a-9dff-a592c304b836", "error": false, "timestamp": 1471266234.644697, "message": "Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x5b114d0>", "message_type": "twisted:log", "task_level": [1594, 3]}
Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"exception": "flocker.apiclient._client.NotFound", "task_level": [1594, 4], "action_type": "flocker:apiclient:http_request", "reason": "{\"description\": \"No node found with given era.\"}", "timestamp": 1471266234.732458, "task_uuid": "2dd0b28b-cdc9-434a-9dff-a592c304b836", "action_status": "failed"}
Aug 15 09:03:54 server15 flocker-docker-plugin[29781]: {"task_uuid": "4dceda0b-7b13-47d3-b17f-1a7fbeed1a50", "error": false, "timestamp": 1471266234.733952, "message": "Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x5b114d0>", "message_type": "twisted:log", "task_level": [1]}

Trying to get more information on flocker-dataset-agent service:

[root@server1 ~]# journalctl -u flocker-dataset-agent.service -l -r
-- Logs begin at Sat 2016-08-13 23:16:11 EDT, end at Mon 2016-08-15 09:06:33 EDT. --
Aug 15 08:58:11 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 15 08:58:11 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 15 08:58:11 server1 systemd[1]: Failed to start Flocker Dataset Agent.
Aug 15 08:58:11 server1 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 15 08:58:11 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 15 08:58:11 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 15 08:58:11 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 15 08:58:11 server1 systemd[1]: flocker-dataset-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 15 08:58:11 server1 flocker-dataset-agent[29869]: {"task_uuid": "d10519ee-b417-4ccd-a90c-7c780a4d5b14", "error": false, "timestamp": 1471265891.600116, "message": "Main loop terminated.", "message_type": "twisted:log", "task_level": [1]}
Aug 15 08:58:11 server1 flocker-dataset-agent[29869]: {"task_uuid": "8a2f9dff-07d1-4aab-b8c4-717536adf44e", "error": true, "timestamp": 1471265891.599722, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/floc
Aug 15 08:58:11 server1 flocker-dataset-agent[29869]: {"task_uuid": "f7b0c1cf-eefd-4007-8742-156177a62b0a", "error": true, "timestamp": 1471265891.598659, "message": "Unhandled Error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.
Aug 15 08:58:11 server1 flocker-dataset-agent[29869]: {"task_uuid": "1a59b3b6-386d-4c61-a938-ba687b38f8a0", "error": false, "timestamp": 1471265891.587703, "message": "Log opened.", "message_type": "twisted:log", "task_level": [1]}
Aug 15 08:58:10 server1 systemd[1]: Starting Flocker Dataset Agent...
Aug 15 08:58:10 server1 systemd[1]: Started Flocker Dataset Agent.

Tested PureStorage driver manually:

[root@server1 purestorage-flocker-driver]# python
Python 3.4.2 (default, Apr 19 2016, 08:30:31) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import purestorage
>>> fa=purestorage.FlashArray('192.168.128.157',api_token='50cac451-fefe-b635-9692-5870aada9c49')
/opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
/opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
>>> fa.get()
/opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/requests/packages/urllib3/connectionpool.py:821: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.org/en/latest/security.html
  InsecureRequestWarning)
{'revision': '201605132143+b186ed4', 'id': '7e4d75d0-ffc1-4e7c-880a-c044849e1993', 'version': '4.6.8', 'array_name': 'PURTPC0027'}
>>> 

As you can see above, connectivity is there. We were able to connect, authenticate using a token and were able to retrieve PureStorage basic info (i.e., version, name, etc)..

Verified the sanity of /etc/flocker/node.crt:

[root@server1 ~]# openssl x509 -text -in /etc/flocker/node.crt 
Certificate:
    Data:
        Version: 1 (0x0)
        Serial Number:
            fb:69:42:2e:a2:2e:52:1d:28:b9:59:d7:1d:06:21:13
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: OU=06dc4747-5c8d-4878-8943-b7d34cbf08c4, CN=flocker
        Validity
            Not Before: Jun 29 15:15:36 2016 GMT
            Not After : Jun 24 15:15:36 2036 GMT
        Subject: OU=06dc4747-5c8d-4878-8943-b7d34cbf08c4, CN=node-f810470c-7140-4017-b61d-4d9d326e4a3f
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (4096 bit)
                Modulus:
                    00:98:f7:e4:d0:4d:84:a8:43:a7:97:f4:95:09:19:
                    6f:d0:71:0b:76:56:3e:ca:14:86:3b:6f:03:be:ad:
                    4a:fb:1a:54:5c:51:e7:80:67:3d:e1:1a:92:8a:f5:
                    13:87:b1:e5:bd:16:72:de:53:85:31:f7:d3:c5:63:
                    dd:51:90:bb:03:68:b4:9a:ed:bc:82:44:18:6c:4a:
                    9d:21:32:c0:d1:bf:51:84:81:d4:ba:d6:ee:8d:6a:
                    f2:07:f6:8c:ac:87:61:6e:92:8e:2f:62:6a:5a:18:
                    38:ed:40:14:7f:87:e7:f0:cf:af:af:a7:94:54:b5:
                    83:c8:4c:38:6f:72:27:13:ee:50:53:1d:f2:04:0d:
                    70:d3:7f:a0:06:53:19:b9:dd:d7:85:4b:56:18:c7:
                    f9:29:2a:23:7e:0a:fb:9f:8c:f0:4a:b9:fe:c8:1f:
                    c6:97:3c:d9:a4:19:3a:5b:95:4b:87:d8:2a:71:55:
                    01:92:f3:ed:85:b9:12:8b:b1:4c:ab:14:51:1f:48:
                    47:bf:b2:36:ca:fd:4a:64:12:2f:c5:eb:bb:49:9f:
                    12:03:84:d4:02:dc:90:13:6a:c7:f8:7e:ff:03:2c:
                    16:c8:06:3d:a0:6a:59:e2:00:7b:1a:07:9f:9a:b8:
                    12:95:5a:a2:28:ce:84:2d:d7:e8:03:ae:c6:9b:5c:
                    60:c4:2c:c6:8b:31:7c:e7:02:aa:9d:85:1e:f5:b0:
                    64:de:d9:78:29:f9:35:d2:db:64:28:01:6a:c4:7e:
                    21:63:7d:09:0d:44:50:02:7c:23:ff:32:23:4e:5d:
                    51:be:28:05:b2:8c:7c:26:d0:01:9e:4f:ca:bc:69:
                    c1:a0:31:43:75:1a:f8:1e:7b:b7:f8:c0:88:2d:9b:
                    2f:09:00:8f:9e:95:42:39:a6:21:3e:cf:f8:0a:27:
                    f9:85:0d:fd:9a:14:3c:6e:2e:35:57:bc:b5:fa:ec:
                    06:04:c3:cb:5a:e4:a1:cf:ca:05:d8:67:5a:70:27:
                    39:0a:6d:9a:70:67:6a:a7:50:a2:25:01:d2:85:d5:
                    40:ce:12:29:75:d2:98:c1:88:14:56:1e:6a:a6:26:
                    f8:a2:3a:cc:36:57:e1:75:d5:00:48:cf:a4:bb:16:
                    41:59:a6:39:79:0d:52:63:cf:9f:6a:cc:bf:fd:90:
                    56:1c:de:eb:55:e8:a4:ed:62:4d:af:08:97:2b:a9:
                    cc:14:53:02:51:84:8a:bb:c7:8b:cb:c9:32:b2:cc:
                    7c:5d:8a:83:0c:97:06:28:bf:9f:c7:62:97:7b:e1:
                    c5:9f:24:a6:c8:73:f4:a5:34:2a:3b:7c:a2:b1:6b:
                    af:15:5a:99:72:27:19:49:b8:85:15:c8:39:48:72:
                    ba:05:1b
                Exponent: 65537 (0x10001)
    Signature Algorithm: sha256WithRSAEncryption
         9d:bc:37:b9:43:cd:89:ac:ad:50:c0:4e:83:03:26:0e:ce:42:
         ed:1f:1f:d5:3d:06:63:b3:6c:d4:d0:7d:f8:26:99:32:11:4f:
         ce:27:9f:ac:80:06:90:71:72:ef:7e:5f:a8:0a:86:ed:4d:65:
         d5:4f:55:5a:26:eb:94:27:0b:23:62:39:4f:0a:b3:07:a4:c8:
         dd:7c:b6:27:77:62:8c:f7:28:61:c0:56:bc:9c:b2:5e:08:36:
         78:52:34:a5:9f:25:54:e0:0d:0c:76:5b:27:a9:fc:5c:72:82:
         6d:82:65:11:fd:43:fa:29:d5:be:85:37:37:1b:4e:87:db:f5:
         d0:5e:92:48:ba:c0:ef:8f:b1:d4:04:11:00:53:4d:b4:e8:74:
         41:2e:6a:49:8a:3b:6a:b9:74:98:96:1a:74:f6:2f:de:5c:7c:
         98:4b:7a:3a:1a:55:98:99:23:d8:8e:c7:1f:32:c1:10:4c:4b:
         34:01:9c:bc:44:8b:eb:59:0e:c7:1a:0f:17:81:35:42:38:42:
         72:4a:06:d9:f4:9c:94:9b:66:f9:fb:ae:04:b2:32:cc:3c:d6:
         48:06:5b:c1:83:f6:ea:36:66:73:fc:1b:d0:46:91:8a:bf:12:
         d5:0e:71:6a:e9:18:5e:37:45:97:d5:d3:8d:3d:28:33:ed:83:
         9e:ca:79:d0:b2:b4:98:a7:ab:f9:da:48:b9:98:85:f7:5b:97:
         09:af:08:04:ef:f9:33:fb:31:d1:b5:8f:e4:69:d8:a5:12:53:
         92:cd:4f:17:d1:2d:ce:c7:c3:14:db:ac:dc:6c:4f:4c:79:2b:
         1b:ed:a0:c9:40:29:bd:93:48:3e:2d:96:4b:ab:6c:09:fe:b3:
         08:cd:4e:87:01:72:fb:d4:db:85:ed:5c:60:83:20:c2:c1:0f:
         c4:ce:fb:bd:aa:80:bc:eb:49:7d:30:ee:49:f1:11:7c:6b:8d:
         5c:aa:b0:9f:5d:ad:12:fa:fa:7f:aa:00:88:52:a9:a7:41:fc:
         79:5d:9f:2e:12:81:95:07:f3:64:e5:b7:22:49:95:42:4e:50:
         03:72:c4:ae:11:cf:ae:99:b5:47:5a:2c:a0:09:1e:98:0f:c7:
         a0:8a:64:e3:4b:e0:49:70:db:6b:4e:bd:e3:14:3a:ca:1f:c7:
         59:29:bc:28:41:fd:45:ea:be:fc:ed:c5:2e:ff:4a:57:ec:59:
         42:44:08:33:18:50:fc:78:00:99:a7:50:2a:73:c6:3a:1c:63:
         08:ce:53:45:0d:97:5a:50:08:ed:07:7f:90:35:56:a0:c4:c1:
         5b:28:96:d0:1c:7a:7a:54:b6:98:d6:e4:ed:08:a7:94:91:a7:
         86:e3:6f:36:e7:7f:6b:99
-----BEGIN CERTIFICATE-----
MIIFLDCCAxQCEQD7aUIuoi5SHSi5WdcdBiETMA0GCSqGSIb3DQEBCwUAMEExLTAr
BgNVBAsMJDA2ZGM0NzQ3LTVjOGQtNDg3OC04OTQzLWI3ZDM0Y2JmMDhjNDEQMA4G
A1UEAwwHZmxvY2tlcjAiGA8yMDE2MDYyOTE1MTUzNloYDzIwMzYwNjI0MTUxNTM2
WjBjMS0wKwYDVQQLDCQwNmRjNDc0Ny01YzhkLTQ4NzgtODk0My1iN2QzNGNiZjA4
YzQxMjAwBgNVBAMMKW5vZGUtZjgxMDQ3MGMtNzE0MC00MDE3LWI2MWQtNGQ5ZDMy
NmU0YTNmMIICIjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAmPfk0E2EqEOn
l/SVCRlv0HELdlY+yhSGO28Dvq1K+xpUXFHngGc94RqSivUTh7HlvRZy3lOFMffT
xWPdUZC7A2i0mu28gkQYbEqdITLA0b9RhIHUutbujWryB/aMrIdhbpKOL2JqWhg4
7UAUf4fn8M+vr6eUVLWDyEw4b3InE+5QUx3yBA1w03+gBlMZud3XhUtWGMf5KSoj
fgr7n4zwSrn+yB/GlzzZpBk6W5VLh9gqcVUBkvPthbkSi7FMqxRRH0hHv7I2yv1K
ZBIvxeu7SZ8SA4TUAtyQE2rH+H7/AywWyAY9oGpZ4gB7GgefmrgSlVqiKM6ELdfo
A67Gm1xgxCzGizF85wKqnYUe9bBk3tl4Kfk10ttkKAFqxH4hY30JDURQAnwj/zIj
Tl1RvigFsox8JtABnk/KvGnBoDFDdRr4Hnu3+MCILZsvCQCPnpVCOaYhPs/4Cif5
hQ39mhQ8bi41V7y1+uwGBMPLWuShz8oF2GdacCc5Cm2acGdqp1CiJQHShdVAzhIp
ddKYwYgUVh5qpib4ojrMNlfhddUASM+kuxZBWaY5eQ1SY8+fasy//ZBWHN7rVeik
7WJNrwiXK6nMFFMCUYSKu8eLy8kyssx8XYqDDJcGKL+fx2KXe+HFnySmyHP0pTQq
O3yisWuvFVqZcicZSbiFFcg5SHK6BRsCAwEAATANBgkqhkiG9w0BAQsFAAOCAgEA
nbw3uUPNiaytUMBOgwMmDs5C7R8f1T0GY7Ns1NB9+CaZMhFPziefrIAGkHFy735f
qAqG7U1l1U9VWibrlCcLI2I5TwqzB6TI3Xy2J3dijPcoYcBWvJyyXgg2eFI0pZ8l
VOANDHZbJ6n8XHKCbYJlEf1D+inVvoU3NxtOh9v10F6SSLrA74+x1AQRAFNNtOh0
QS5qSYo7arl0mJYadPYv3lx8mEt6OhpVmJkj2I7HHzLBEExLNAGcvESL61kOxxoP
F4E1QjhCckoG2fSclJtm+fuuBLIyzDzWSAZbwYP26jZmc/wb0EaRir8S1Q5xaukY
XjdFl9XTjT0oM+2Dnsp50LK0mKer+dpIuZiF91uXCa8IBO/5M/sx0bWP5GnYpRJT
ks1PF9EtzsfDFNus3GxPTHkrG+2gyUApvZNIPi2WS6tsCf6zCM1OhwFy+9Tbhe1c
YIMgwsEPxM77vaqAvOtJfTDuSfERfGuNXKqwn12tEvr6f6oAiFKpp0H8eV2fLhKB
lQfzZOW3IkmVQk5QA3LErhHPrpm1R1osoAkemA/HoIpk40vgSXDba0694xQ6yh/H
WSm8KEH9Req+/O3FLv9KV+xZQkQIMxhQ/HgAmadQKnPGOhxjCM5TRQ2XWlAI7Qd/
kDVWoMTBWyiW0Bx6elS2mNbk7QinlJGnhuNvNud/a5k=
-----END CERTIFICATE-----
[root@server1 ~]# 

Tried to run flocker-diagnostics, but it exited:

[root@server1 ~]# flocker-diagnostics 
Traceback (most recent call last):
  File "/usr/sbin/flocker-diagnostics", line 7, in <module>
    from flocker.node.script import flocker_diagnostics_main
  File "/opt/flocker/lib/python2.7/site-packages/flocker/node/__init__.py", line 20, in <module>
    from .script import DeployerType
  File "/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py", line 16, in <module>
    import yaml
  File "/opt/mesosphere/lib/python3.4/site-packages/yaml/__init__.py", line 284
    class YAMLObject(metaclass=YAMLObjectMetaclass):
                              ^
SyntaxError: invalid syntax
[root@server1 ~]# 

I am trying at least configure one node correctly. Once configured and visible from flockerctl command, I will proceed with the rest of the nodes (i.e., using node 1 as a template). Any pointers or suggestions ?

Thanks so much!!

wallnerryan commented 8 years ago

@akamalov the docker plugin error is because the dataset agent isnt registering correctly and the container agent it looks like is because it cont communicate with control service.

I suspect the certificates aren't configured correctly.

see

      Issuer: OU=06dc4747-5c8d-4878-8943-b7d34cbf08c4, CN=flocker

This tells me the CA control certificate was likely create with flocker-ca create-control-certifcate flocker. The dataset agents agent.yml looks like its configured with "hostname": 192.168.120.156. Try to configure with "hostname": flocker and set flocker 192.168.120.156 in /etc/hosts.

Then restart the control service, dataset agent, container agent and docker plugin. Remember if you are not using flocker's built in orchestration, and docker cli only, you do not need to start or enable the container agent.

wallnerryan commented 8 years ago

also, try to use journalctl -u flocker-dataset-agent -xn --no-pager or journalctl -u flocker-dataset-agent -xn | less to see better debug output, it looks like the actual error is cut off

eddy-ru commented 8 years ago

@wallnerryan i do u say that ,but my flocker-dataset-agent is fault

error:flocker-dataset-agent.service: main process exited, code=exited, status=1/FAILURE Aug 17 01:49:04 ip-192-168-9-96 systemd: Unit flocker-dataset-agent.service entered failed state. Aug 17 01:49:04 ip-192-168-9-96 systemd: flocker-dataset-agent.service failed. Aug 17 01:49:04 ip-192-168-9-96 systemd: flocker-dataset-agent.service holdoff time over, scheduling restart. Aug 17 01:49:04 ip-192-168-9-96 systemd: start request repeated too quickly for flocker-dataset-agent.service Aug 17 01:49:04 ip-192-168-9-96 systemd: Failed to start Flocker Dataset Agent. Aug 17 01:49:04 ip-192-168-9-96 systemd: Unit flocker-dataset-agent.service entered failed state. Aug 17 01:49:04 ip-192-168-9-96 systemd: flocker-dataset-agent.service failed. Aug 17 01:50:29 ip-192-168-9-96 systemd: Job dev-xvdb.device/start timed out. Aug 17 01:50:29 ip-192-168-9-96 systemd: Timed out waiting for device dev-xvdb.device. Aug 17 01:50:29 ip-192-168-9-96 systemd: Dependency failed for /mnt. Aug 17 01:50:29 ip-192-168-9-96 systemd: Job mnt.mount/start failed with result 'dependency'. Aug 17 01:50:29 ip-192-168-9-96 systemd: Dependency failed for File System Check on /dev/xvdb. Aug 17 01:50:29 ip-192-168-9-96 systemd: Job systemd-fsck@dev-xvdb.service/start failed with result 'dependency'. Aug 17 01:50:29 ip-192-168-9-96 systemd: Job dev-xvdb.device/start failed with result 'timeout'. Aug 17 01:51:32 ip-192-168-9-96 systemd: Starting Cleanup of Temporary Directories... Aug 17 01:51:32 ip-192-168-9-96 systemd: Started Cleanup of Temporary Directories.

akamalov commented 8 years ago

Thanks so much @wallnerryan , just implemented your suggestions. Things I did:

  1. Re-generated cert to be absolutely sure
[root@mmaster1 ~]# /opt/flocker/bin/flocker-ca create-control-certificate flocker
Created control-flocker.crt and control-flocker.key
Copy these files to the directory /etc/flocker on your control service machine.
Rename the files to control-service.crt and control-service.key and set the correct permissions by running chmod 0600 on both files. 
[root@mmaster1 ~]#

2.Restarted FCS services 3.Re-generated node authentication certificates

[root@mmaster1 ~]# /opt/flocker/bin/flocker-ca create-node-certificate
Created b0cae52c-5376-4e0e-a66d-5990e2965c37.crt. Copy it over to /etc/flocker/node.crt on your node machine and make sure to chmod 0600 it.
[root@mmaster1 ~]#

4.Copied generated certs over to a node 5.Modified node /etc/hosts to alias flocker to 192.168.120.156 6.Modifed /etc/flocker/agent.yml file to point to host flocker and not the IP address:

"version": 1
"control-service": 
  "hostname": "flocker"
  "port": 4523
"dataset": 
    "backend": "purestorage_flasharray_flocker_driver"
    "pure_ip": 172.16.128.157 
    "pure_api_token": "50cac451-fefe-b635-9692-5870aada9c49"
    "pure_chap_host_user": "mslave5"
    "pure_chap_host_password": "pureBXPmslave5"

7.Restarted node Flocker services:

systemctl restart flocker-dataset-agent
systemctl restart flocker-container-agent
systemctl restart flocker-docker-plugin

Status still coming up as failed:

[root@mslave5 ~]# systemctl -l status flocker-dataset-agent
● flocker-dataset-agent.service - Flocker Dataset Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-dataset-agent.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2016-08-17 06:53:40 EDT; 8min ago
  Process: 56620 ExecStart=/usr/sbin/flocker-dataset-agent --journald (code=exited, status=1/FAILURE)
 Main PID: 56620 (code=exited, status=1/FAILURE)

Aug 17 06:53:40 mslave5 systemd[1]: flocker-dataset-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 17 06:53:40 mslave5 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 06:53:40 mslave5 systemd[1]: flocker-dataset-agent.service failed.
Aug 17 06:53:40 mslave5 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 17 06:53:40 mslave5 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 17 06:53:40 mslave5 systemd[1]: Failed to start Flocker Dataset Agent.
Aug 17 06:53:40 mslave5 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 06:53:40 mslave5 systemd[1]: flocker-dataset-agent.service failed.
[root@mslave5 ~]# 

As per your request, the output of journalctl -u flocker-dataset-agent -xn --no-pager:

[root@server1 ~]# journalctl -u flocker-dataset-agent -xn --no-pager
-- Logs begin at Mon 2016-08-15 21:20:28 EDT, end at Wed 2016-08-17 06:54:29 EDT. --
Aug 17 06:53:40 server1 flocker-dataset-agent[56620]: {"task_uuid": "1d1ff0d0-1ebd-4d9e-9bd4-2486108e875d", "error": true, "timestamp": 1471431220.353552, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 84, in flocker_dataset_agent_main\n    options=options,\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 295, in main\n    self._react(run_and_log, [], _reactor=self._reactor)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/task.py\", line 908, in react\n    finished = main(_reactor, *argv)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 283, in run_and_log\n    d = maybeDeferred(self.script.main, reactor, options)\n--- <exception caught here> ---\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/defer.py\", line 150, in maybeDeferred\n    result = f(*args, **kw)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 295, in main\n    self.service_factory(reactor, options)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 591, in get_service\n    agent_service = self.agent_service_factory(configuration)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 485, in from_configuration\n    configuration['dataset']\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/backends.py\", line 176, in backend_and_api_args_from_configuration\n    backend = backend_loader.get(backend_name)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/plugin.py\", line 130, in get\n    raise PluginNotFound(plugin_name=plugin_name)\nflocker.common.plugin.PluginNotFound: 'purestorage_flasharray_flocker_driver' is neither a built-in plugin nor a 3rd party module.\n", "message_type": "twisted:log", "task_level": [1]}
Aug 17 06:53:40 server1 flocker-dataset-agent[56620]: {"task_uuid": "72bc76ba-2e83-4c82-99d9-a9219af7b6d2", "error": false, "timestamp": 1471431220.353932, "message": "Main loop terminated.", "message_type": "twisted:log", "task_level": [1]}
Aug 17 06:53:40 server1 systemd[1]: flocker-dataset-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 17 06:53:40 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 06:53:40 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 17 06:53:40 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 17 06:53:40 server1 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 17 06:53:40 server1 systemd[1]: Failed to start Flocker Dataset Agent.
-- Subject: Unit flocker-dataset-agent.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flocker-dataset-agent.service has failed.
-- 
-- The result is failed.
Aug 17 06:53:40 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 06:53:40 server1 systemd[1]: flocker-dataset-agent.service failed.
[root@server1 ~]# 
akamalov commented 8 years ago

Interesting part is this:

packages/flocker/node/script.py\", line 485, in from_configuration\n    configuration['dataset']\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/backends.py\", line 176, in backend_and_api_args_from_configuration\n    backend = backend_loader.get(backend_name)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/plugin.py\", line 130, in get\n    raise PluginNotFound(plugin_name=plugin_name)\nflocker.common.plugin.PluginNotFound: 'purestorage_flasharray_flocker_driver' is neither a built-in plugin nor a 3rd party module.\n", "message_type": "twisted:log", "task_level": [1]}

This is strange that it is not finding Pure plugin (purestorage_flasharray_flocker_driver' is neither a built-in plugin nor a 3rd party module). It has been installed as per Pure's flocker driver installation README file (available at https://github.com/PureStorage-OpenConnect/purestorage-flocker-driver). Here is the excerpt from installation:

[root@server1 purestorage-flocker-driver]# python setup.py install
running install
[pbr] Writing ChangeLog
[pbr] Generating ChangeLog
[pbr] ChangeLog complete (0.0s)
[pbr] Generating AUTHORS
[pbr] AUTHORS complete (0.0s)
running build
running build_py
running egg_info
writing dependency_links to purestorage_flocker_driver.egg-info/dependency_links.txt
writing top-level names to purestorage_flocker_driver.egg-info/top_level.txt
writing purestorage_flocker_driver.egg-info/PKG-INFO
writing requirements to purestorage_flocker_driver.egg-info/requires.txt
writing pbr to purestorage_flocker_driver.egg-info/pbr.json
[pbr] Processing SOURCES.txt
[pbr] In git context, generating filelist from git
warning: no previously-included files found matching '.gitreview'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
writing manifest file 'purestorage_flocker_driver.egg-info/SOURCES.txt'
running install_lib
running install_egg_info
removing '/opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/purestorage_flocker_driver-1.0.0-py3.4.egg-info' (and everything under it)
Copying purestorage_flocker_driver.egg-info to /opt/mesosphere/packages/python--e3169ded66609d3cb4055a3f9f8f0b1113a557a6/lib/python3.4/site-packages/purestorage_flocker_driver-1.0.0-py3.4.egg-info
running install_scripts
[root@server1 purestorage-flocker-driver]# 

May be it is not finding the driver ? Should I be telling dataset-agent where to find it ?

wallnerryan commented 8 years ago

@akamalov make sure and install the plugin into flocker's virtual env. so /opt/flocker/bin/python or /opt/flocker/bin/pip

the source /opt/flocker/bin/activate from their setup/install docs should put you in the virtualenv and use the corrent python/pip but just double check by trying explicitly

akamalov commented 8 years ago

Tried both. Getting errors:

Install using pip:

[root@server1 purestorage-flocker-driver]# /opt/flocker/bin/python setup.py install
ERROR:root:Error parsing
Traceback (most recent call last):
  File "/opt/flocker/lib/python2.7/site-packages/pbr/core.py", line 111, in pbr
    attrs = util.cfg_to_args(path, dist.script_args)
  File "/opt/flocker/lib/python2.7/site-packages/pbr/util.py", line 264, in cfg_to_args
    wrap_commands(kwargs)
  File "/opt/flocker/lib/python2.7/site-packages/pbr/util.py", line 573, in wrap_commands
    cmdclass = ep.load(False)
  File "/opt/mesosphere/lib/python3.4/site-packages/pkg_resources.py", line 2029, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  File "/opt/flocker/lib/python2.7/site-packages/sphinx/setup_command.py", line 23, in <module>
    from sphinx.application import Sphinx
  File "/opt/flocker/lib/python2.7/site-packages/sphinx/application.py", line 26, in <module>
    from docutils.parsers.rst import convert_directive_function, \
  File "/opt/mesosphere/lib/python3.4/site-packages/docutils/parsers/rst/__init__.py", line 74, in <module>
    import docutils.statemachine
  File "/opt/mesosphere/lib/python3.4/site-packages/docutils/statemachine.py", line 218
    % (self.line_offset, '\n| '.join(self.input_lines))), file=self._stderr)
                                                              ^
SyntaxError: invalid syntax
error in setup command: Error parsing /root/purestorage-flocker-driver/setup.cfg: SyntaxError: invalid syntax (statemachine.py, line 218)
[root@server1 purestorage-flocker-driver]# /opt/flocker/bin/pip install purestorage-flocker-driver
Traceback (most recent call last):
  File "/opt/flocker/bin/pip", line 7, in <module>
    from pip import main
  File "/opt/mesosphere/lib/python3.4/site-packages/pip/__init__.py", line 14, in <module>
    from pip.utils import get_installed_distributions, get_prog
ImportError: No module named utils
[root@server1 purestorage-flocker-driver]# 

Install using python:

[root@server1 purestorage-flocker-driver]# /opt/flocker/bin/python setup.py install
ERROR:root:Error parsing
Traceback (most recent call last):
  File "/opt/flocker/lib/python2.7/site-packages/pbr/core.py", line 111, in pbr
    attrs = util.cfg_to_args(path, dist.script_args)
  File "/opt/flocker/lib/python2.7/site-packages/pbr/util.py", line 264, in cfg_to_args
    wrap_commands(kwargs)
  File "/opt/flocker/lib/python2.7/site-packages/pbr/util.py", line 573, in wrap_commands
    cmdclass = ep.load(False)
  File "/opt/mesosphere/lib/python3.4/site-packages/pkg_resources.py", line 2029, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  File "/opt/flocker/lib/python2.7/site-packages/sphinx/setup_command.py", line 23, in <module>
    from sphinx.application import Sphinx
  File "/opt/flocker/lib/python2.7/site-packages/sphinx/application.py", line 26, in <module>
    from docutils.parsers.rst import convert_directive_function, \
  File "/opt/mesosphere/lib/python3.4/site-packages/docutils/parsers/rst/__init__.py", line 74, in <module>
    import docutils.statemachine
  File "/opt/mesosphere/lib/python3.4/site-packages/docutils/statemachine.py", line 218
    % (self.line_offset, '\n| '.join(self.input_lines))), file=self._stderr)
                                                              ^
SyntaxError: invalid syntax
error in setup command: Error parsing /root/purestorage-flocker-driver/setup.cfg: SyntaxError: invalid syntax (statemachine.py, line 218)
[root@server1 purestorage-flocker-driver]# 
wallnerryan commented 8 years ago

@akamalov I am seeing really odd things in here.

Please make sure you only use /opt/flocker/ virtualenv for flocker. I am seeing mesosphere installation artifacts in here which looks like is making python 3.4 to be called with flocker should be 2.7

 File "/opt/mesosphere/lib/python3.4/site-packages/docutils/parsers/rst/__init__.py", line 74, in <module>
    import docutils.statemachine
  File "/opt/mesosphere/lib/python3.4/site-packages/docutils/statemachine.py", line 218

please make sure the virtualenv is clean and then install flocker and purestorage driver there.

akamalov commented 8 years ago

Thanks @wallnerryan. The environment has Mesosphere DCOS installed, where I have to integrate Flocker + PureStorage.

Here is step-by-step procedure I am doing to install the driver:

Activate Flocker virtenv:

[root@server1 purestorage-flocker-driver]# source /opt/flocker/bin/activate

Clone repository for Pure Flocker driver:

(flocker) [root@server1 ~]# git clone https://github.com/PureStorage-OpenConnect/purestorage-flocker-driver
Cloning into 'purestorage-flocker-driver'...
remote: Counting objects: 22, done.
remote: Total 22 (delta 0), reused 0 (delta 0), pack-reused 22
Unpacking objects: 100% (22/22), done.

Install Pure Flocker driver:

(flocker) [root@server1 ~]# cd purestorage-flocker-driver/
(flocker) [root@server1 purestorage-flocker-driver]# /opt/flocker/bin/pip install purestorage_flasharray_flocker_driver
Downloading/unpacking purestorage-flasharray-flocker-driver
  Could not find any downloads that satisfy the requirement purestorage-flasharray-flocker-driver
Cleaning up...
No distributions at all found for purestorage-flasharray-flocker-driver
Storing debug log for failure in /root/.pip/pip.log
(flocker) [root@server1 purestorage-flocker-driver]# 

Let's take a look at /root/.pip/pip.log:

[root@server1 purestorage-flocker-driver]# cat /root/.pip/pip.log
------------------------------------------------------------
/opt/flocker/bin/pip run on Wed Aug 17 11:21:17 2016
Downloading/unpacking purestorage-flasharray-flocker-driver
  Getting page https://pypi.python.org/simple/purestorage_flasharray_flocker_driver/
  Could not fetch URL https://pypi.python.org/simple/purestorage_flasharray_flocker_driver/: 404 Client Error: Not Found
  Will skip URL https://pypi.python.org/simple/purestorage_flasharray_flocker_driver/ when looking for download links for purestorage-flasharray-flocker-driver
  Getting page https://pypi.python.org/simple/
  URLs to search for versions for purestorage-flasharray-flocker-driver:
  * https://pypi.python.org/simple/purestorage_flasharray_flocker_driver/
  Getting page https://pypi.python.org/simple/purestorage_flasharray_flocker_driver/
  Could not fetch URL https://pypi.python.org/simple/purestorage_flasharray_flocker_driver/: 404 Client Error: Not Found
  Will skip URL https://pypi.python.org/simple/purestorage_flasharray_flocker_driver/ when looking for download links for purestorage-flasharray-flocker-driver
  Could not find any downloads that satisfy the requirement purestorage-flasharray-flocker-driver
Cleaning up...
  Removing temporary dir /opt/flocker/build...
No distributions at all found for purestorage-flasharray-flocker-driver
Exception information:
Traceback (most recent call last):
  File "/opt/mesosphere/lib/python3.4/site-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/opt/mesosphere/lib/python3.4/site-packages/pip/commands/install.py", line 278, in run
    requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
  File "/opt/mesosphere/lib/python3.4/site-packages/pip/req.py", line 1177, in prepare_files
    url = finder.find_requirement(req_to_install, upgrade=self.upgrade)
  File "/opt/mesosphere/lib/python3.4/site-packages/pip/index.py", line 277, in find_requirement
    raise DistributionNotFound('No distributions at all found for %s' % req)
DistributionNotFound: No distributions at all found for purestorage-flasharray-flocker-driver
[root@server1 purestorage-flocker-driver]# 

I do not know why would activated Flocker virtualenv still source /opt/mesosphere/bin/python...

wallnerryan commented 8 years ago

@akamalov yeah thats odd. After you clone the purestorage repo and cd into it try to use /opt/flocker/bin/pip install instead of /opt/flocker/bin/pip install purestorage_flasharray_flocker_driver or dont cd into it just clone it and run /opt/flocker/bin/pip install purestorage-flocker-driver/. The way above will try to search for it remotely.

also, when you install mesosphere make sure to install it into the system python path and not in flocker virtualenv, its looks like its in /opt/mesosphere/ but there seems to be some of it bleading into each-other but this is the root pip log, not just pip log so likely both virtualenv log there

akamalov commented 8 years ago

@wallnerryan thanks for your feedback! Here is what I did so far:

1.Completely uninstalled Flocker* from the node 2.Reinstalled Flocker

Current Flocker operating environment:

clusterhq-flocker-node-1.14.0-1.noarch
clusterhq-python-flocker-1.14.0-1.x86_64
clusterhq-flocker-docker-plugin-1.14.0-1.noarch

3.Renamed /opt/mesosphere/lib/python3.4 to/opt/mesosphere/lib/_python3.4

  1. Installed Pure driver:
(flocker) [root@server1 purestorage-flocker-driver]# /opt/flocker/bin/python setup.py install
running install
[pbr] Writing ChangeLog
[pbr] Generating ChangeLog
[pbr] ChangeLog complete (0.0s)
[pbr] Generating AUTHORS
[pbr] AUTHORS complete (0.0s)
running build
running build_py
running egg_info
writing pbr to purestorage_flocker_driver.egg-info/pbr.json
writing requirements to purestorage_flocker_driver.egg-info/requires.txt
writing purestorage_flocker_driver.egg-info/PKG-INFO
writing top-level names to purestorage_flocker_driver.egg-info/top_level.txt
writing dependency_links to purestorage_flocker_driver.egg-info/dependency_links.txt
[pbr] Processing SOURCES.txt
[pbr] In git context, generating filelist from git
warning: no previously-included files found matching '.gitreview'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
writing manifest file 'purestorage_flocker_driver.egg-info/SOURCES.txt'
running install_lib
creating /opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver
copying build/lib/purestorage_flasharray_flocker_driver/__init__.py -> /opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver
copying build/lib/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py -> /opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver
byte-compiling /opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/__init__.py to __init__.pyc
byte-compiling /opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py to purestorage_blockdevice.pyc
running install_egg_info
Copying purestorage_flocker_driver.egg-info to /opt/flocker/lib/python2.7/site-packages/purestorage_flocker_driver-1.0.0-py2.7.egg-info
running install_scripts
(flocker) [root@server1 purestorage-flocker-driver]# 

Restarted flocker-dataset-agent:

(flocker) [root@server1 purestorage-flocker-driver]# systemctl start flocker-dataset-agent

Failed to start:

(flocker) [root@server1 purestorage-flocker-driver]# systemctl -l status flocker-dataset-agent
● flocker-dataset-agent.service - Flocker Dataset Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-dataset-agent.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2016-08-17 13:51:00 EDT; 5min ago
  Process: 28232 ExecStart=/usr/sbin/flocker-dataset-agent --journald (code=exited, status=1/FAILURE)
 Main PID: 28232 (code=exited, status=1/FAILURE)

Aug 17 13:51:00 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 13:51:00 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 17 13:51:00 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 17 13:51:00 server1 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 17 13:51:00 server1 systemd[1]: Failed to start Flocker Dataset Agent.
Aug 17 13:51:00 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 13:51:00 server1 systemd[1]: flocker-dataset-agent.service failed.
(flocker) [root@server1 purestorage-flocker-driver]# 

Journal entries:

(flocker) [root@server1 purestorage-flocker-driver]# journalctl -u flocker-dataset-agent -xn --no-pager

-- Logs begin at Tue 2016-08-16 02:54:02 EDT, end at Wed 2016-08-17 13:51:45 EDT. --
Aug 17 13:51:00 server1 flocker-dataset-agent[28232]: {"task_uuid": "e4f2ef48-24af-4624-ae3f-c328d4674c75", "error": true, "timestamp": 1471456260.652017, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 84, in flocker_dataset_agent_main\n    options=options,\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 295, in main\n    self._react(run_and_log, [], _reactor=self._reactor)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/task.py\", line 908, in react\n    finished = main(_reactor, *argv)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 283, in run_and_log\n    d = maybeDeferred(self.script.main, reactor, options)\n--- <exception caught here> ---\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/defer.py\", line 150, in maybeDeferred\n    result = f(*args, **kw)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 295, in main\n    self.service_factory(reactor, options)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 591, in get_service\n    agent_service = self.agent_service_factory(configuration)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 485, in from_configuration\n    configuration['dataset']\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/backends.py\", line 176, in backend_and_api_args_from_configuration\n    backend = backend_loader.get(backend_name)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/plugin.py\", line 128, in get\n    plugin_module = namedAny(plugin_name)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/python/reflect.py\", line 303, in namedAny\n    topLevelPackage = _importAndCheckStack(trialname)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/python/reflect.py\", line 250, in _importAndCheckStack\n    reraise(excValue, excTraceback)\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/__init__.py\", line 6, in <module>\n    from purestorage_flasharray_flocker_driver import purestorage_blockdevice\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py\", line 14, in <module>\n    from oslo_log import log as logging\nexceptions.ImportError: No module named oslo_log\n", "message_type": "twisted:log", "task_level": [1]}
Aug 17 13:51:00 server1 flocker-dataset-agent[28232]: {"task_uuid": "df37d0ef-a69c-437b-940f-8036638024d2", "error": false, "timestamp": 1471456260.652479, "message": "Main loop terminated.", "message_type": "twisted:log", "task_level": [1]}
wallnerryan commented 8 years ago

@akamalov this looks much better!

Looks like oslo_log is a dependency of the purestorage driver and it is used here: https://github.com/PureStorage-OpenConnect/purestorage-flocker-driver/blob/master/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py#L14

But its not in the requirements: https://github.com/PureStorage-OpenConnect/purestorage-flocker-driver/blob/master/requirements.txt

try /opt/flocker/bin/pip install oslo.log

if this helps, may want to let them know they should add the python dep.

akamalov commented 8 years ago

Installed oslo_log dependency:

(flocker) [root@server1 purestorage-flocker-driver]# /opt/flocker/bin/pip install oslo.log
Collecting oslo.log
  Downloading oslo.log-3.14.0-py2.py3-none-any.whl (50kB)
    100% |████████████████████████████████| 51kB 523kB/s 
Requirement already satisfied (use --upgrade to upgrade): pbr>=1.6 in /opt/flocker/lib/python2.7/site-packages (from oslo.log)
Requirement already satisfied (use --upgrade to upgrade): oslo.i18n>=2.1.0 in /opt/flocker/lib/python2.7/site-packages (from oslo.log)
Requirement already satisfied (use --upgrade to upgrade): six>=1.9.0 in /opt/flocker/lib/python2.7/site-packages (from oslo.log)
Collecting pyinotify>=0.9.6; sys_platform != "win32" and sys_platform != "darwin" and sys_platform != "sunos5" (from oslo.log)
  Downloading pyinotify-0.9.6.tar.gz (60kB)
    100% |████████████████████████████████| 61kB 1.2MB/s 
Collecting oslo.utils>=3.16.0 (from oslo.log)
  Using cached oslo.utils-3.16.0-py2.py3-none-any.whl
Collecting oslo.context>=2.4.0 (from oslo.log)
  Downloading oslo.context-2.8.0-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): python-dateutil>=2.4.2 in /opt/flocker/lib/python2.7/site-packages (from oslo.log)
Requirement already satisfied (use --upgrade to upgrade): oslo.serialization>=1.10.0 in /opt/flocker/lib/python2.7/site-packages (from oslo.log)
Collecting oslo.config>=3.14.0 (from oslo.log)
  Using cached oslo.config-3.15.0-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): debtcollector>=1.2.0 in /opt/flocker/lib/python2.7/site-packages (from oslo.log)
Requirement already satisfied (use --upgrade to upgrade): Babel>=2.3.4 in /opt/flocker/lib/python2.7/site-packages (from oslo.i18n>=2.1.0->oslo.log)
Requirement already satisfied (use --upgrade to upgrade): netifaces>=0.10.4 in /opt/flocker/lib/python2.7/site-packages (from oslo.utils>=3.16.0->oslo.log)
Requirement already satisfied (use --upgrade to upgrade): funcsigs>=0.4; python_version == "2.7" or python_version == "2.6" in /opt/flocker/lib/python2.7/site-packages (from oslo.utils>=3.16.0->oslo.log)
Requirement already satisfied (use --upgrade to upgrade): netaddr!=0.7.16,>=0.7.12 in /opt/flocker/lib/python2.7/site-packages (from oslo.utils>=3.16.0->oslo.log)
Requirement already satisfied (use --upgrade to upgrade): monotonic>=0.6 in /opt/flocker/lib/python2.7/site-packages (from oslo.utils>=3.16.0->oslo.log)
Requirement already satisfied (use --upgrade to upgrade): pytz>=2013.6 in /opt/flocker/lib/python2.7/site-packages (from oslo.utils>=3.16.0->oslo.log)
Collecting pyparsing>=2.0.1 (from oslo.utils>=3.16.0->oslo.log)
  Downloading pyparsing-2.1.8-py2.py3-none-any.whl (54kB)
    100% |████████████████████████████████| 61kB 1.4MB/s 
Requirement already satisfied (use --upgrade to upgrade): iso8601>=0.1.11 in /opt/flocker/lib/python2.7/site-packages (from oslo.utils>=3.16.0->oslo.log)
Requirement already satisfied (use --upgrade to upgrade): msgpack-python>=0.4.0 in /opt/flocker/lib/python2.7/site-packages (from oslo.serialization>=1.10.0->oslo.log)
Requirement already satisfied (use --upgrade to upgrade): rfc3986>=0.2.0 in /opt/flocker/lib/python2.7/site-packages (from oslo.config>=3.14.0->oslo.log)
Collecting stevedore>=1.16.0 (from oslo.config>=3.14.0->oslo.log)
  Using cached stevedore-1.17.0-py2.py3-none-any.whl
Requirement already satisfied (use --upgrade to upgrade): wrapt>=1.7.0 in /opt/flocker/lib/python2.7/site-packages (from debtcollector>=1.2.0->oslo.log)
Building wheels for collected packages: pyinotify
  Running setup.py bdist_wheel for pyinotify ... done
  Stored in directory: /root/.cache/pip/wheels/ca/66/27/bba9699a96562cab73cd97e80ca0fc9c1656b757293f889ad4
Successfully built pyinotify
Installing collected packages: pyinotify, pyparsing, oslo.utils, oslo.context, stevedore, oslo.config, oslo.log
  Found existing installation: oslo.utils 3.14.0
    Uninstalling oslo.utils-3.14.0:
      Successfully uninstalled oslo.utils-3.14.0
  Found existing installation: stevedore 1.15.0
    Uninstalling stevedore-1.15.0:
      Successfully uninstalled stevedore-1.15.0
  Found existing installation: oslo.config 3.12.0
    Uninstalling oslo.config-3.12.0:
      Successfully uninstalled oslo.config-3.12.0
Successfully installed oslo.config-3.15.0 oslo.context-2.8.0 oslo.log-3.14.0 oslo.utils-3.16.0 pyinotify-0.9.6 pyparsing-2.1.8 stevedore-1.17.0
(flocker) [root@server1 purestorage-flocker-driver]# 

Attempt to start flocker-dataset-agent:

(flocker) [root@server1 ~]# systemctl start flocker-dataset-agent

Display status for flocker-dataset-agent:

(flocker) [root@server1 ~]# systemctl -l status flocker-dataset-agent
● flocker-dataset-agent.service - Flocker Dataset Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-dataset-agent.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2016-08-17 14:24:31 EDT; 1s ago
  Process: 8183 ExecStart=/usr/sbin/flocker-dataset-agent --journald (code=exited, status=1/FAILURE)
 Main PID: 8183 (code=exited, status=1/FAILURE)

Aug 17 14:24:31 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 14:24:31 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 17 14:24:31 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 17 14:24:31 server1 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 17 14:24:31 server1 systemd[1]: Failed to start Flocker Dataset Agent.
Aug 17 14:24:31 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 14:24:31 server1 systemd[1]: flocker-dataset-agent.service failed.

Display journal log for flocker-dataset-agent:


(flocker) [root@server1 ~]# journalctl -u flocker-dataset-agent -xn --no-pager
-- Logs begin at Tue 2016-08-16 03:49:32 EDT, end at Wed 2016-08-17 14:25:02 EDT. --
Aug 17 14:24:31 server1 flocker-dataset-agent[8183]: {"task_uuid": "cf272d0b-82d2-462d-ac40-72072b655d25", "error": true, "timestamp": 1471458271.225677, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 84, in flocker_dataset_agent_main\n    options=options,\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 295, in main\n    self._react(run_and_log, [], _reactor=self._reactor)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/task.py\", line 908, in react\n    finished = main(_reactor, *argv)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 283, in run_and_log\n    d = maybeDeferred(self.script.main, reactor, options)\n--- <exception caught here> ---\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/defer.py\", line 150, in maybeDeferred\n    result = f(*args, **kw)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 295, in main\n    self.service_factory(reactor, options)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 591, in get_service\n    agent_service = self.agent_service_factory(configuration)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 485, in from_configuration\n    configuration['dataset']\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/backends.py\", line 176, in backend_and_api_args_from_configuration\n    backend = backend_loader.get(backend_name)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/plugin.py\", line 128, in get\n    plugin_module = namedAny(plugin_name)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/python/reflect.py\", line 303, in namedAny\n    topLevelPackage = _importAndCheckStack(trialname)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/python/reflect.py\", line 250, in _importAndCheckStack\n    reraise(excValue, excTraceback)\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/__init__.py\", line 6, in <module>\n    from purestorage_flasharray_flocker_driver import purestorage_blockdevice\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py\", line 15, in <module>\n    import purestorage\nexceptions.ImportError: No module named purestorage\n", "message_type": "twisted:log", "task_level": [1]}
Aug 17 14:24:31 server1 flocker-dataset-agent[8183]: {"task_uuid": "09e4f74c-e5b1-49df-bbaa-08a5348aeb6c", "error": false, "timestamp": 1471458271.22638, "message": "Main loop terminated.", "message_type": "twisted:log", "task_level": [1]}
Aug 17 14:24:31 server1 systemd[1]: flocker-dataset-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 17 14:24:31 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 14:24:31 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 17 14:24:31 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 17 14:24:31 server1 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 17 14:24:31 server1 systemd[1]: Failed to start Flocker Dataset Agent.
-- Subject: Unit flocker-dataset-agent.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flocker-dataset-agent.service has failed.
-- 
-- The result is failed.
Aug 17 14:24:31 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 14:24:31 server1 systemd[1]: flocker-dataset-agent.service failed.
(flocker) [root@server1 ~]# 
wallnerryan commented 8 years ago

@akamalov No module named purestorage

/opt/flocker/bin/pip install purestorage

for some reason, looks like when purestorage driver is installed its missing some dependencies.

akamalov commented 8 years ago

Successfully installed:

(flocker) [root@server1 purestorage-flocker-driver]# /opt/flocker/bin/pip install purestorage
Collecting purestorage
  Downloading purestorage-1.6.1.tar.gz
Requirement already satisfied (use --upgrade to upgrade): requests in /opt/flocker/lib/python2.7/site-packages (from purestorage)
Building wheels for collected packages: purestorage
  Running setup.py bdist_wheel for purestorage ... done
  Stored in directory: /root/.cache/pip/wheels/08/a3/31/1b766596e6e11ac2c3ce2d21c6988d81876af92ae13737aedd
Successfully built purestorage
Installing collected packages: purestorage
Successfully installed purestorage-1.6.1
(flocker) [root@server1 purestorage-flocker-driver]# 

Start flocker-dataset-agent:

(flocker) [root@server1 purestorage-flocker-driver]# systemctl start flocker-dataset-agent

Display status:

(flocker) [root@server1 purestorage-flocker-driver]# systemctl -l status flocker-dataset-agent
● flocker-dataset-agent.service - Flocker Dataset Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-dataset-agent.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Wed 2016-08-17 14:37:41 EDT; 846ms ago
  Process: 32238 ExecStart=/usr/sbin/flocker-dataset-agent --journald (code=exited, status=1/FAILURE)
 Main PID: 32238 (code=exited, status=1/FAILURE)

Aug 17 14:37:41 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 14:37:41 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 17 14:37:41 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 17 14:37:41 server1 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 17 14:37:41 server1 systemd[1]: Failed to start Flocker Dataset Agent.
Aug 17 14:37:41 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 14:37:41 server1 systemd[1]: flocker-dataset-agent.service failed.
(flocker) [root@server1 purestorage-flocker-driver]# 

Display journal log entries:

(flocker) [root@server1 purestorage-flocker-driver]# journalctl -u flocker-dataset-agent -xn --no-pager
-- Logs begin at Tue 2016-08-16 03:49:32 EDT, end at Wed 2016-08-17 14:39:41 EDT. --
Aug 17 14:37:41 server1 flocker-dataset-agent[32238]: {"task_uuid": "b28986b8-c775-42a4-a7e6-3652cd2a6b91", "error": true, "timestamp": 1471459061.697364, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 84, in flocker_dataset_agent_main\n    options=options,\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 295, in main\n    self._react(run_and_log, [], _reactor=self._reactor)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/task.py\", line 908, in react\n    finished = main(_reactor, *argv)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 283, in run_and_log\n    d = maybeDeferred(self.script.main, reactor, options)\n--- <exception caught here> ---\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/defer.py\", line 150, in maybeDeferred\n    result = f(*args, **kw)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 295, in main\n    self.service_factory(reactor, options)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 591, in get_service\n    agent_service = self.agent_service_factory(configuration)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 485, in from_configuration\n    configuration['dataset']\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/backends.py\", line 176, in backend_and_api_args_from_configuration\n    backend = backend_loader.get(backend_name)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/plugin.py\", line 128, in get\n    plugin_module = namedAny(plugin_name)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/python/reflect.py\", line 303, in namedAny\n    topLevelPackage = _importAndCheckStack(trialname)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/python/reflect.py\", line 250, in _importAndCheckStack\n    reraise(excValue, excTraceback)\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/__init__.py\", line 6, in <module>\n    from purestorage_flasharray_flocker_driver import purestorage_blockdevice\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py\", line 96, in <module>\n    from os_brick.initiator import connector\nexceptions.ImportError: No module named os_brick.initiator\n", "message_type": "twisted:log", "task_level": [1]}
Aug 17 14:37:41 server1 flocker-dataset-agent[32238]: {"task_uuid": "c8cd9982-6577-4d97-9ca7-db48ad217ee4", "error": false, "timestamp": 1471459061.698055, "message": "Main loop terminated.", "message_type": "twisted:log", "task_level": [1]}
Aug 17 14:37:41 server1 systemd[1]: flocker-dataset-agent.service: main process exited, code=exited, status=1/FAILURE
Aug 17 14:37:41 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 14:37:41 server1 systemd[1]: flocker-dataset-agent.service failed.
Aug 17 14:37:41 server1 systemd[1]: flocker-dataset-agent.service holdoff time over, scheduling restart.
Aug 17 14:37:41 server1 systemd[1]: start request repeated too quickly for flocker-dataset-agent.service
Aug 17 14:37:41 server1 systemd[1]: Failed to start Flocker Dataset Agent.
-- Subject: Unit flocker-dataset-agent.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit flocker-dataset-agent.service has failed.
-- 
-- The result is failed.
Aug 17 14:37:41 server1 systemd[1]: Unit flocker-dataset-agent.service entered failed state.
Aug 17 14:37:41 server1 systemd[1]: flocker-dataset-agent.service failed.
(flocker) [root@server1 purestorage-flocker-driver]# 

P.S. All this procedure is done while /opt/mesosphere/lib/python3.4 is renamed to /opt/mesosphere/lib/_python3.4. It won't work otherwise...

wallnerryan commented 8 years ago

@akamalov errors that are no module found seem to be coming from purestorage at this point. I would keep installing the modules that are missing in the meantime. This is something worth opening an issue with on purestorage.

on a side note, i don't know why having /opt/mesosphere/ python path would disrupt things, i will look into that.

wallnerryan commented 8 years ago

os.brick and purestorage are requirement in https://github.com/PureStorage-OpenConnect/purestorage-flocker-driver/blob/master/requirements.txt so they should be installed when you install the driver. Seems your python environment is somehow causing issues

wallnerryan commented 8 years ago

looks like using their python setup.py install method doesnt pull in the requirements. https://github.com/PureStorage-OpenConnect/purestorage-flocker-driver/blob/master/setup.py

Having something like below in setup.py would help, alternatively, try to use the pip install way of doing things until pure fixes that.

with open("requirements.txt") as requirements:
    install_requires = requirements.readlines()

setup(
    ...
    install_requires=install_requires,
   ...
)
akamalov commented 8 years ago

I posted message asking for @patrick-east from Pure to look into module dependencies...

akamalov commented 8 years ago

Manually installed os-brick. Restarting flocker-dataset-agent...

akamalov commented 8 years ago
(flocker) [root@server1 purestorage-flocker-driver]# journalctl -u flocker-dataset-agent -xn --no-pager

-- Logs begin at Tue 2016-08-16 04:45:02 EDT, end at Wed 2016-08-17 15:10:28 EDT. --
Aug 17 15:10:04 server1 flocker-dataset-agent[16388]: {"task_uuid": "0a009354-52e6-4532-9d54-466cc2565772", "error": true, "timestamp": 1471461004.235742, "message": "main function encountered error\nTraceback (most recent call last):\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 84, in flocker_dataset_agent_main\n    options=options,\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 295, in main\n    self._react(run_and_log, [], _reactor=self._reactor)\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/task.py\", line 908, in react\n    finished = main(_reactor, *argv)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/common/script.py\", line 283, in run_and_log\n    d = maybeDeferred(self.script.main, reactor, options)\n--- <exception caught here> ---\n  File \"/opt/flocker/lib/python2.7/site-packages/twisted/internet/defer.py\", line 150, in maybeDeferred\n    result = f(*args, **kw)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 295, in main\n    self.service_factory(reactor, options)\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 594, in get_service\n    api = agent_service.get_api()\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 533, in get_api\n    cluster_id\n  File \"/opt/flocker/lib/python2.7/site-packages/flocker/node/script.py\", line 417, in get_api\n    return backend.api_factory(**api_args)\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/__init__.py\", line 19, in api_factory\n    pure_ssl_cert=kwargs.get('pure_ssl_cert'),\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py\", line 705, in pure_from_configuration\n    cluster_id=cluster_id,\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py\", line 221, in __init__\n    self._initiator_info = self._get_initiator_info()\n  File \"/opt/flocker/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py\", line 252, in _get_initiator_info\n    info = connector.get_connector_properties(None, None, True, True)\n  File \"/opt/flocker/lib/python2.7/site-packages/os_brick/utils.py\", line 135, in trace_logging_wrapper\n    return f(*args, **kwargs)\n  File \"/opt/flocker/lib/python2.7/site-packages/os_brick/initiator/connector.py\", line 158, in get_connector_properties\n    execute=execute))\n  File \"/opt/flocker/lib/python2.7/site-packages/os_brick/initiator/connector.py\", line 463, in get_connector_properties\n    execute=kwargs.get('execute')))\n  File \"/opt/flocker/lib/python2.7/site-packages/os_brick/initiator/linuxscsi.py\", line 121, in is_multipath_running\n    run_as_root=True, root_helper=root_helper)\n  File \"/opt/flocker/lib/python2.7/site-packages/os_brick/privileged/rootwrap.py\", line 57, in execute\n    return execute_root(*cmd, **kwargs)\n  File \"/opt/flocker/lib/python2.7/site-packages/oslo_privsep/priv_context.py\", line 203, in _wrap\n    self.start()\n  File \"/opt/flocker/lib/python2.7/site-packages/oslo_privsep/priv_context.py\", line 214, in start\n    channel = daemon.RootwrapClientChannel(context=self)\n  File \"/opt/flocker/lib/python2.7/site-packages/oslo_privsep/daemon.py\", line 289, in __init__\n    raise FailedToDropPrivileges(msg)\noslo_privsep.daemon.FailedToDropPrivileges: privsep helper command exited non-zero (1)\n", "message_type": "twisted:log", "task_level": [1]}
Aug 17 15:10:04 server1 flocker-dataset-agent[16388]: {"task_uuid": "8463cf21-b5b0-4c55-b924-01f7758061fd", "error": false, "timestamp": 1471461004.237071, "message": "Main loop terminated.", "message_type": "twisted:log", "task_level": [1]}

``
wallnerryan commented 8 years ago

@akamalov flocker doesn't use oslo/os-brick natively. This error looks like a problem with purestorage. Not sure I can help any further right now. Flocker seems to be trying to boot the driver but is encountering pure driver specific issues.

patrick-east commented 8 years ago

Hey, so it does look like something might be wrong with the setup.py not installing the requirements. I'll check into that and see whats up.

For the most recent error it's because you installed the latest os-brick that tries to use privsep, you will need the version it is pinned to in requirements.txt.

akamalov commented 8 years ago

Thanks fellows!!

patrick-east commented 8 years ago

It should do the right thing if you pip install the driver, I verified that it is not using dependencies when running setup.py directly. See example below:

ubuntu@test:~$ virtualenv foo
New python executable in /home/ubuntu/foo/bin/python
Installing setuptools, pip, wheel...done.
ubuntu@test:~$ source foo/bin/activate
(foo) ubuntu@test:~$ cd purestorage-flocker-driver/
(foo) ubuntu@test:~/purestorage-flocker-driver$ python setup.py install

Installed /home/ubuntu/purestorage-flocker-driver/.eggs/pbr-1.10.0-py2.7.egg
[pbr] Generating ChangeLog
running install
[pbr] Writing ChangeLog
[pbr] Generating ChangeLog
[pbr] ChangeLog complete (0.0s)
[pbr] Generating AUTHORS
[pbr] AUTHORS complete (0.0s)
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/purestorage_flasharray_flocker_driver
copying purestorage_flasharray_flocker_driver/purestorage_blockdevice.py -> build/lib.linux-x86_64-2.7/purestorage_flasharray_flocker_driver
copying purestorage_flasharray_flocker_driver/__init__.py -> build/lib.linux-x86_64-2.7/purestorage_flasharray_flocker_driver
running egg_info
creating purestorage_flocker_driver.egg-info
writing pbr to purestorage_flocker_driver.egg-info/pbr.json
writing requirements to purestorage_flocker_driver.egg-info/requires.txt
writing purestorage_flocker_driver.egg-info/PKG-INFO
writing top-level names to purestorage_flocker_driver.egg-info/top_level.txt
writing dependency_links to purestorage_flocker_driver.egg-info/dependency_links.txt
[pbr] Processing SOURCES.txt
writing manifest file 'purestorage_flocker_driver.egg-info/SOURCES.txt'
[pbr] In git context, generating filelist from git
warning: no previously-included files found matching '.gitreview'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
writing manifest file 'purestorage_flocker_driver.egg-info/SOURCES.txt'
running install_lib
creating /home/ubuntu/foo/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver
copying build/lib.linux-x86_64-2.7/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py -> /home/ubuntu/foo/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver
copying build/lib.linux-x86_64-2.7/purestorage_flasharray_flocker_driver/__init__.py -> /home/ubuntu/foo/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver
byte-compiling /home/ubuntu/foo/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/purestorage_blockdevice.py to purestorage_blockdevice.pyc
byte-compiling /home/ubuntu/foo/lib/python2.7/site-packages/purestorage_flasharray_flocker_driver/__init__.py to __init__.pyc
running install_egg_info
Copying purestorage_flocker_driver.egg-info to /home/ubuntu/foo/lib/python2.7/site-packages/purestorage_flocker_driver-1.0.0-py2.7.egg-info
running install_scripts

It succeeds, but if we look at what was installed...

(foo) ubuntu@test:~/purestorage-flocker-driver$ pip freeze
purestorage-flocker-driver==1.0.0

There are none of the dependencies :(

Trying with pip we get:

(foo) ubuntu@test:~/purestorage-flocker-driver$ pip install .
Processing /home/ubuntu/purestorage-flocker-driver
  Requirement already satisfied (use --upgrade to upgrade): purestorage-flocker-driver==1.0.0 from file:///home/ubuntu/purestorage-flocker-driver in /home/ubuntu/foo/lib/python2.7/site-packages
Collecting os-brick==1.3.0 (from purestorage-flocker-driver==1.0.0)
/home/ubuntu/foo/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
  SNIMissingWarning
/home/ubuntu/foo/local/lib/python2.7/site-packages/pip/_vendor/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Downloading os_brick-1.3.0-py2-none-any.whl (98kB)
    100% |████████████████████████████████| 102kB 1.7MB/s
Collecting purestorage>=1.6.0 (from purestorage-flocker-driver==1.0.0)
  Downloading purestorage-1.6.1.tar.gz
Collecting six>=1.9.0 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading six-1.10.0-py2.py3-none-any.whl
Collecting oslo.concurrency>=3.5.0 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading oslo.concurrency-3.13.0-py2.py3-none-any.whl (43kB)
    100% |████████████████████████████████| 51kB 4.8MB/s
Collecting requests!=2.9.0,>=2.8.1 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading requests-2.11.0-py2.py3-none-any.whl (514kB)
    100% |████████████████████████████████| 522kB 2.2MB/s
Collecting Babel>=1.3 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading Babel-2.3.4-py2.py3-none-any.whl (7.1MB)
    100% |████████████████████████████████| 7.1MB 176kB/s
Collecting oslo.i18n>=2.1.0 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading oslo.i18n-3.8.0-py2.py3-none-any.whl (41kB)
    100% |████████████████████████████████| 51kB 12.7MB/s
Collecting oslo.utils>=3.5.0 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading oslo.utils-3.16.0-py2.py3-none-any.whl (92kB)
    100% |████████████████████████████████| 92kB 10.9MB/s
Collecting eventlet!=0.18.3,>=0.18.2 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading eventlet-0.19.0-py2.py3-none-any.whl (138kB)
    100% |████████████████████████████████| 143kB 8.2MB/s
Collecting oslo.service>=1.0.0 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading oslo.service-1.15.0-py2.py3-none-any.whl (71kB)
    100% |████████████████████████████████| 71kB 6.4MB/s
Collecting oslo.log>=1.14.0 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading oslo.log-3.14.0-py2.py3-none-any.whl (50kB)
    100% |████████████████████████████████| 51kB 6.5MB/s
Collecting pbr>=1.6 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading pbr-1.10.0-py2.py3-none-any.whl (96kB)
    100% |████████████████████████████████| 102kB 10.4MB/s
Collecting retrying!=1.3.0,>=1.2.3 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading retrying-1.3.3.tar.gz
Collecting oslo.serialization>=1.10.0 (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading oslo.serialization-2.13.0-py2.py3-none-any.whl
Collecting iso8601>=0.1.11 (from oslo.concurrency>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading iso8601-0.1.11-py2.py3-none-any.whl
Collecting oslo.config>=3.12.0 (from oslo.concurrency>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading oslo.config-3.15.0-py2.py3-none-any.whl (97kB)
    100% |████████████████████████████████| 102kB 10.0MB/s
Collecting fasteners>=0.7 (from oslo.concurrency>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading fasteners-0.14.1-py2.py3-none-any.whl
Collecting enum34; python_version == "2.7" or python_version == "2.6" or python_version == "3.3" (from oslo.concurrency>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading enum34-1.1.6-py2-none-any.whl
Collecting pytz>=0a (from Babel>=1.3->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading pytz-2016.6.1-py2.py3-none-any.whl (481kB)
    100% |████████████████████████████████| 481kB 2.7MB/s
Collecting funcsigs>=0.4; python_version == "2.7" or python_version == "2.6" (from oslo.utils>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading funcsigs-1.0.2-py2.py3-none-any.whl
Collecting monotonic>=0.6 (from oslo.utils>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading monotonic-1.2-py2.py3-none-any.whl
Collecting pyparsing>=2.0.1 (from oslo.utils>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading pyparsing-2.1.8-py2.py3-none-any.whl (54kB)
    100% |████████████████████████████████| 61kB 1.2MB/s
Collecting netifaces>=0.10.4 (from oslo.utils>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading netifaces-0.10.4.tar.gz
Collecting debtcollector>=1.2.0 (from oslo.utils>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading debtcollector-1.8.0-py2.py3-none-any.whl
Collecting netaddr!=0.7.16,>=0.7.12 (from oslo.utils>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading netaddr-0.7.18-py2.py3-none-any.whl (1.5MB)
    100% |████████████████████████████████| 1.5MB 814kB/s
Collecting greenlet>=0.3 (from eventlet!=0.18.3,>=0.18.2->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading greenlet-0.4.10-cp27-cp27mu-manylinux1_x86_64.whl (41kB)
    100% |████████████████████████████████| 51kB 10.7MB/s
Collecting Paste (from oslo.service>=1.0.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading Paste-2.0.3-py2-none-any.whl (625kB)
    100% |████████████████████████████████| 634kB 2.0MB/s
Collecting Routes!=2.0,!=2.1,!=2.3.0,>=1.12.3; python_version == "2.7" (from oslo.service>=1.0.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading Routes-2.3.1.tar.gz (181kB)
    100% |████████████████████████████████| 184kB 3.4MB/s
Collecting PasteDeploy>=1.5.0 (from oslo.service>=1.0.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading PasteDeploy-1.5.2-py2.py3-none-any.whl
Collecting WebOb>=1.2.3 (from oslo.service>=1.0.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading WebOb-1.6.1-py2.py3-none-any.whl (77kB)
    100% |████████████████████████████████| 81kB 8.5MB/s
Collecting oslo.context>=2.4.0 (from oslo.log>=1.14.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading oslo.context-2.8.0-py2.py3-none-any.whl
Collecting pyinotify>=0.9.6; sys_platform != "win32" and sys_platform != "darwin" and sys_platform != "sunos5" (from oslo.log>=1.14.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading pyinotify-0.9.6.tar.gz (60kB)
    100% |████████████████████████████████| 61kB 7.6MB/s
Collecting python-dateutil>=2.4.2 (from oslo.log>=1.14.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading python_dateutil-2.5.3-py2.py3-none-any.whl (201kB)
    100% |████████████████████████████████| 204kB 3.1MB/s
Collecting msgpack-python>=0.4.0 (from oslo.serialization>=1.10.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading msgpack-python-0.4.8.tar.gz (113kB)
    100% |████████████████████████████████| 122kB 7.5MB/s
Collecting rfc3986>=0.2.0 (from oslo.config>=3.12.0->oslo.concurrency>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading rfc3986-0.3.1-py2.py3-none-any.whl
Collecting stevedore>=1.16.0 (from oslo.config>=3.12.0->oslo.concurrency>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading stevedore-1.17.0-py2.py3-none-any.whl
Collecting wrapt>=1.7.0 (from debtcollector>=1.2.0->oslo.utils>=3.5.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading wrapt-1.10.8.tar.gz
Collecting repoze.lru>=0.3 (from Routes!=2.0,!=2.1,!=2.3.0,>=1.12.3; python_version == "2.7"->oslo.service>=1.0.0->os-brick==1.3.0->purestorage-flocker-driver==1.0.0)
  Downloading repoze.lru-0.6.tar.gz
Building wheels for collected packages: purestorage-flocker-driver, purestorage, retrying, netifaces, Routes, pyinotify, msgpack-python, wrapt, repoze.lru
  Running setup.py bdist_wheel for purestorage-flocker-driver ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/13/7c/de/bb0532214f7169da2a7188d8df44424bef1d25670cffe07527
  Running setup.py bdist_wheel for purestorage ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/08/a3/31/1b766596e6e11ac2c3ce2d21c6988d81876af92ae13737aedd
  Running setup.py bdist_wheel for retrying ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/d9/08/aa/49f7c109140006ea08a7657640aee3feafb65005bcd5280679
  Running setup.py bdist_wheel for netifaces ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/42/b4/d9/cd4ae93303efc2ba3441a21e2f2d74a49e07a5be9012d01520
  Running setup.py bdist_wheel for Routes ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/c9/49/f4/219ca0f3395557463b9669d92c025e2eb46b2dcedac7ff7675
  Running setup.py bdist_wheel for pyinotify ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/ca/66/27/bba9699a96562cab73cd97e80ca0fc9c1656b757293f889ad4
  Running setup.py bdist_wheel for msgpack-python ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/2c/e7/e7/9031652a69d594665c5ca25e41d0fb3faa024e730b590e4402
  Running setup.py bdist_wheel for wrapt ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/19/8a/01/20cf74c3f38d49ef8e9b9aa7ffd38cefff79bad9f6f2c651f0
  Running setup.py bdist_wheel for repoze.lru ... done
  Stored in directory: /home/ubuntu/.cache/pip/wheels/b2/cd/b3/7e24400bff83325a01d492940eff6e9579f553f33348323d79
Successfully built purestorage-flocker-driver purestorage retrying netifaces Routes pyinotify msgpack-python wrapt repoze.lru
Installing collected packages: six, funcsigs, pytz, Babel, pbr, oslo.i18n, monotonic, iso8601, pyparsing, netifaces, wrapt, debtcollector, netaddr, oslo.utils, retrying, rfc3986, stevedore, oslo.config, fasteners, enum34, oslo.concurrency, requests, greenlet, eventlet, Paste, oslo.context, msgpack-python, oslo.serialization, pyinotify, python-dateutil, oslo.log, repoze.lru, Routes, PasteDeploy, WebOb, oslo.service, os-brick, purestorage
Successfully installed Babel-2.3.4 Paste-2.0.3 PasteDeploy-1.5.2 Routes-2.3.1 WebOb-1.6.1 debtcollector-1.8.0 enum34-1.1.6 eventlet-0.19.0 fasteners-0.14.1 funcsigs-1.0.2 greenlet-0.4.10 iso8601-0.1.11 monotonic-1.2 msgpack-python-0.4.8 netaddr-0.7.18 netifaces-0.10.4 os-brick-1.3.0 oslo.concurrency-3.13.0 oslo.config-3.15.0 oslo.context-2.8.0 oslo.i18n-3.8.0 oslo.log-3.14.0 oslo.serialization-2.13.0 oslo.service-1.15.0 oslo.utils-3.16.0 pbr-1.10.0 purestorage-1.6.1 pyinotify-0.9.6 pyparsing-2.1.8 python-dateutil-2.5.3 pytz-2016.6.1 repoze.lru-0.6 requests-2.11.0 retrying-1.3.3 rfc3986-0.3.1 six-1.10.0 stevedore-1.17.0 wrapt-1.10.8
akamalov commented 8 years ago

Re-run steps again. This time it seem like it worked.

Activate virutalenv:

root@server1 purestorage-flocker-driver]# source /opt/flocker/bin/activate
(flocker) [root@server1 purestorage-flocker-driver]# which python
/opt/flocker/bin/python
(flocker) [root@server1 purestorage-flocker-driver]# python --version
Python 2.7.5
(flocker) [root@server1 purestorage-flocker-driver]#

Install driver

(flocker) [root@server1 purestorage-flocker-driver]# python setup.py install
running install
[pbr] Writing ChangeLog
[pbr] Generating ChangeLog
[pbr] ChangeLog complete (0.0s)
[pbr] Generating AUTHORS
[pbr] AUTHORS complete (0.0s)
running build
running build_py
running egg_info
writing pbr to purestorage_flocker_driver.egg-info/pbr.json
writing requirements to purestorage_flocker_driver.egg-info/requires.txt
writing purestorage_flocker_driver.egg-info/PKG-INFO
writing top-level names to purestorage_flocker_driver.egg-info/top_level.txt
writing dependency_links to purestorage_flocker_driver.egg-info/dependency_links.txt
[pbr] Processing SOURCES.txt
[pbr] In git context, generating filelist from git
warning: no previously-included files found matching '.gitreview'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
writing manifest file 'purestorage_flocker_driver.egg-info/SOURCES.txt'
running install_lib
running install_egg_info
removing '/opt/flocker/lib/python2.7/site-packages/purestorage_flocker_driver-1.0.0-py2.7.egg-info' (and everything under it)
Copying purestorage_flocker_driver.egg-info to /opt/flocker/lib/python2.7/site-packages/purestorage_flocker_driver-1.0.0-py2.7.egg-info
running install_scripts
(flocker) [root@server1 purestorage-flocker-driver]#

Install requirements

(flocker) [root@server1 purestorage-flocker-driver]# pip install .
Processing /root/purestorage-flocker-driver
  Requirement already satisfied (use --upgrade to upgrade): purestorage-flocker-driver==1.0.0 from file:///root/purestorage-flocker-driver in /opt/flocker/lib/python2.7/site-packages
Collecting os-brick==1.3.0 (from purestorage-flocker-driver==1.0.0)
  Downloading os_brick-1.3.0-py2-none-any.whl (98kB)
    100% |████████████████████████████████| 102kB 210kB/s 
Requirement already satisfied (use --upgrade to upgrade): purestorage>=1.6.0 in /opt/flocker/lib/python2.7/site-packages (from purestorage-flocker-driver==1.0.0)
Requirement already satisfied (use --upgrade to upgrade): requests!=2.9.0,>=2.8.1 in /opt/flocker/lib/python2.7/site-packages (from os-brick==1.3.0->purestorage-flocker-driver==1.0.0)

Ensure multipathd is working:

(flocker) [root@server1 lib]# systemctl -l status multipathd
● multipathd.service - Device-Mapper Multipath Device Controller
   Loaded: loaded (/usr/lib/systemd/system/multipathd.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2016-08-17 15:39:35 EDT; 26min ago
  Process: 60677 ExecStart=/sbin/multipathd (code=exited, status=0/SUCCESS)
  Process: 60673 ExecStartPre=/sbin/multipath -A (code=exited, status=0/SUCCESS)
  Process: 60671 ExecStartPre=/sbin/modprobe dm-multipath (code=exited, status=0/SUCCESS)
 Main PID: 60679 (multipathd)
   Memory: 1.6M
   CGroup: /system.slice/multipathd.service
           └─60679 /sbin/multipathd

Aug 17 15:40:04 server1 multipathd[60679]: uevent trigger error
Aug 17 15:40:04 server1 multipathd[60679]: sdd: add path (uevent)
Aug 17 15:40:04 server1 multipathd[60679]: sdd: spurious uevent, path already in pathvec
Aug 17 15:40:04 server1 multipathd[60679]: sdd: failed to get path uid
Aug 17 15:40:04 server1 multipathd[60679]: uevent trigger error
Aug 17 15:40:04 server1 multipathd[60679]: sda: add path (uevent)
Aug 17 15:40:04 server1 multipathd[60679]: sda: spurious uevent, path already in pathvec
Aug 17 15:40:04 server1 multipathd[60679]: sda: failed to get path uid
Aug 17 15:40:04 server1 multipathd[60679]: uevent trigger error
Aug 17 15:42:23 server1 systemd[1]: Started Device-Mapper Multipath Device Controller.
(flocker) [root@server1 lib]# 

Start flocker-dataset-agent:

(flocker) [root@server1 purestorage-flocker-driver]# systemctl start flocker-dataset-agent

Display status:

(flocker) [root@server1 lib]# systemctl -l status flocker-dataset-agent
● flocker-dataset-agent.service - Flocker Dataset Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-dataset-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2016-08-17 15:44:19 EDT; 20min ago
 Main PID: 61669 (flocker-dataset)
   Memory: 61.0M
   CGroup: /system.slice/flocker-dataset-agent.service
           └─61669 /opt/flocker/bin/python /usr/sbin/flocker-dataset-agent --journald

Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"task_uuid": "c329618b-98d7-48cb-a256-9c38b0d17a49", "error": false, "timestamp": 1471464279.565325, "message": "Starting factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x7f8ff0036cb0>", "message_type": "twisted:log", "task_level": [1]}
Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"task_uuid": "1e6d71c3-60bc-478c-a381-17e6dc85e3fc", "error": false, "timestamp": 1471464279.57081, "message": "AgentAMP connection established (HOST:IPv4Address(TCP, '10.60.120.165', 32946) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x4d716d0>", "fsm_input": "<ClusterStatusInputs=CONNECTED_TO_CONTROL_SERVICE>", "timestamp": 1471464279.573464, "fsm_rich_input": "<_ConnectedToControlService>", "action_status": "started", "task_uuid": "5292f3c5-1887-4e8d-8bcd-0e1b7e46f2d4", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [1]}
Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"fsm_next_state": "<ClusterStatusStates=IGNORANT>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471464279.574536, "fsm_output": ["<ClusterStatusOutputs=STORE_CLIENT>"], "task_uuid": "5292f3c5-1887-4e8d-8bcd-0e1b7e46f2d4", "action_status": "succeeded"}
Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"task_uuid": "5edd4dcb-f888-449e-9bef-2a99f2b28886", "error": false, "timestamp": 1471464279.683545, "message": "AgentAMP connection lost (HOST:IPv4Address(TCP, '10.60.120.165', 32946) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471464279.685068, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "b240ffb1-ec69-4359-9530-390939f26d8e", "task_level": [1]}
Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x4d716d0>", "fsm_input": "<ClusterStatusInputs=DISCONNECTED_FROM_CONTROL_SERVICE>", "timestamp": 1471464279.686368, "fsm_rich_input": null, "action_status": "started", "task_uuid": "01fece24-15bd-4e93-8f0a-8211945e87a4", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=IGNORANT>", "task_level": [1]}
Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"fsm_next_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471464279.687557, "fsm_output": [], "task_uuid": "01fece24-15bd-4e93-8f0a-8211945e87a4", "action_status": "succeeded"}
Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"task_uuid": "6919913e-1602-4aac-858c-b50fb5ac7a85", "error": false, "timestamp": 1471464279.688601, "message": "<twisted.internet.tcp.Connector instance at 0x4bf2290> will retry in 2 seconds", "message_type": "twisted:log", "task_level": [1]}
Aug 17 16:04:39 server1 flocker-dataset-agent[61669]: {"task_uuid": "3ee2a9c5-4bfc-40ac-99bf-de444d5bb0f4", "error": false, "timestamp": 1471464279.689765, "message": "Stopping factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x7f8ff0036cb0>", "message_type": "twisted:log", "task_level": [1]}
(flocker) [root@server1 lib]# 

...holy cow...

akamalov commented 8 years ago

One last thing: When trying to access FCS (aliased in /etc/hosts as 'flocker'), flockerctl fails to find the node:

[root@mmaster1 ~]# ping -c 1 flocker
PING mmaster1 (192.168.120.156) 56(84) bytes of data.
64 bytes from mmaster1 (192.168.120.156): icmp_seq=1 ttl=64 time=0.089 ms

--- mmaster1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.089/0.089/0.089/0.000 ms

Attempt to list nodes:

[root@mmaster1 ~]# ls -ld user* -rw------- 1 root root 1862 Aug 17 16:22 user.crt -rw------- 1 root root 3276 Aug 17 16:22 user.key [root@mmaster1 ~]#

[root@mmaster1 ~]# flockerctl --control-service=flocker list-nodes Unhandled Error Traceback (most recent call last): Failure: twisted.internet.error.DNSLookupError: DNS lookup failed: address 'flocker' not found: [Errno -2] Name or service not known.

[root@mmaster1 ~]#

wallnerryan commented 8 years ago

ah, because flockerctl runs in a container. You really should to use DNS or IP when creating the certificate. Makes more sense given the architecture, see https://docs.clusterhq.com/en/1.14.0/flocker-features/flockerctl.html#install-configure.

eddy-ru commented 8 years ago

hi @akamalov , I also have the same problem, flocker dataset - agent start-up failure, i don't know how to find the reason, give advice or comments please

eddy thanks

akamalov commented 8 years ago

Thanks @wallnerryan

akamalov commented 8 years ago

@eddy-ru , can you paste your journalctl -u flocker-dataset-agent -xn --no-pager output ?

akamalov commented 8 years ago

@wallnerryan - implemented IP-based cert as suggested. Super-last-question: Trying to display nodes registered:

[root@mmaster1 ~]# flockerctl --control-service=192.168.120.156 list-nodes
SERVER   ADDRESS  

Nothing is being displayed. I think the node certs need to be regenerated ?

Aug 18 08:18:40 server1 flocker-dataset-agent[34788]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471522720.60279, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "f32df07d-e688-45a7-81cb-be8840640796", "task_level": [1]}

Why would it issue slv3 alert certificate unknown ? I have regenerated SSL certs all over again, but still getting the same error. Looked around for answers, stumbled onto this:

https://clusterhq.atlassian.net/browse/FLOC-3712

Based on this page, I tested the keys. From the agent node:

(flocker) [root@mslave3 ~]# openssl s_client -crlf -CAfile /etc/flocker/cluster.crt -cert /etc/flocker/plugin.crt -key /etc/flocker/plugin.key -connect 192.168.120.156:4523 >/dev/null
depth=1 OU = 8c895ef9-095d-43a0-adfe-4add5dfa24bf, CN = flocker
verify return:1
depth=0 OU = 8c895ef9-095d-43a0-adfe-4add5dfa24bf, CN = control-service
verify return:1
(flocker) [root@mslave3 ~]# openssl s_client -CAfile /etc/flocker/cluster.crt -cert /etc/flocker/node.crt -key /etc/flocker/node.key -connect 192.168.120.156:4524 > /dev/null
depth=1 OU = 8c895ef9-095d-43a0-adfe-4add5dfa24bf, CN = flocker
verify return:1
depth=0 OU = 8c895ef9-095d-43a0-adfe-4add5dfa24bf, CN = control-service
verify return:1

I get no errors...

akamalov commented 8 years ago

@wallnerryan anything you can help or give pointers to remediate SSL handshake errors ?

parveenb1 commented 8 years ago

Hello I have Setup Flocker with purestorage. My all flocker components are running fine. But when i create vol with docker. Volume always in pending state

wallnerryan commented 8 years ago

@akamalov what commands did you use for flocker-ca can you post them here? Also, post your control-service config from agent.yml

akamalov commented 8 years ago

Thanks @wallnerryan. Here you go:

Control cert was generated using:

/opt/flocker/bin/flocker-ca create-control-certificate 192.168.120.156

My agent node /etc/flocker/agent.yml file is:

"version": 1
"control-service": 
  "hostname": 192.168.120.156 
  "port": 4523
"dataset": 
    "backend": "purestorage_flasharray_flocker_driver"
    "pure_ip": 172.16.128.157 
    "pure_api_token": "50cac451-fefe-b635-9692-5870aada9c49"
    "pure_chap_host_user": "mslave3"
    "pure_chap_host_password": "XXXXXXX"
wallnerryan commented 8 years ago

@akamalov thanks, and ports are open (4523/4524)? If so, im not sure what is going on here, will try and get a second eye from @wallrj

akamalov commented 8 years ago

On FCS, right ? If yes, ports are open:

[root@mmaster1 ~]# ss -lnt |grep 45*
LISTEN     0      50           *:4523                     *:*                  
LISTEN     0      50           *:4524                     *:*     
[root@mmaster1 ~]# iptables -L --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:4523
2    ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:4524
wallrj commented 8 years ago

@akamalov

EDITED

The rules are:

  1. Clients must connect to the control service using the hostname or IP address that was used when creating the control service certificate.
  2. Clients must supply a client certificate (node.crt or plugin crt) that was signed by the same certificate authority (cluster.crt) as the control service server certificate.
  3. Clients connecting to the flocker REST API on port 4523 must present a certificate (eg plugin.crt) with a common name (CN) beginning with the string "user-"
  4. Flocker agents connecting to the flocker-control agent port 4524 must present a certificate (eg node.crt) with a common name (CN) beginning with the string "node-".

I'd just stop all the flocker processes, double check all the certificates on all nodes are up-to-date and have the correct issuer and common names and then restart all the Flocker processes and try again.

You can view the certificates in plain text using the following command:

# openssl x509 -in /etc/flocker/node.crt -text | head -n 20
Certificate:
    Data:
        Version: 1 (0x0)
        Serial Number:
            db:62:da:57:30:42:09:f6:6b:a2:3d:91:69:b7:fa:b5
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: OU=b6b95eed-3c12-4400-b022-9c5edcac942f, CN=benchmark-114-cluster
        Validity
            Not Before: Aug 18 13:05:13 2016 GMT
            Not After : Aug 13 13:05:13 2036 GMT
        Subject: OU=b6b95eed-3c12-4400-b022-9c5edcac942f, CN=node-13a68848-97e5-48a4-a51f-65767bd20d9b
...

Hope that helps.

akamalov commented 8 years ago

@wallrj , I had to modify the IP addresses as per company policy. If you could please edit your comment to change the IP address first two octets to 192.168. please....

akamalov commented 8 years ago

@wallrj If you could please change 10.60 octets to 192.168 in your comment, please??

Answering your request:

[root@mslave3 ~]#  openssl x509 -in /etc/flocker/node.crt -text | head -n 20
Certificate:
    Data:
        Version: 1 (0x0)
        Serial Number:
            71:77:16:c7:63:84:a4:b7:a6:75:f7:24:2e:15:4a:ad
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: OU=8c895ef9-095d-43a0-adfe-4add5dfa24bf, CN=flocker
        Validity
            Not Before: Aug 18 12:25:30 2016 GMT
            Not After : Aug 13 12:25:30 2036 GMT
        Subject: OU=8c895ef9-095d-43a0-adfe-4add5dfa24bf, CN=node-ac718fa5-08d1-4439-b828-1350ecad0d4a
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (4096 bit)
                Modulus:
                    00:ad:4c:7d:18:c1:2e:5b:4f:da:b2:81:6e:b5:cf:
                    bc:d7:42:63:29:37:6d:e3:08:f5:8a:2c:91:d9:9b:
                    a1:94:5c:27:8b:49:37:21:31:86:1a:5a:84:c5:3b:
                    73:e7:ce:59:20:3f:61:13:76:fa:6a:0e:31:b3:38:
                    dd:a4:9c:06:ef:6d:7b:32:41:6e:ad:50:0c:92:f7:
[root@mslave3 ~]# 
akamalov commented 8 years ago

Yet, I am getting a service error. Below is the status of Flocker-Control service on a master. Timestamp shows that it has been restarted recently:

[root@mmaster1 new.cert]# systemctl -l status flocker-control
● flocker-control.service - Flocker Control Service
   Loaded: loaded (/usr/lib/systemd/system/flocker-control.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 13:11:06 EDT; 42min ago
 Main PID: 31719 (flocker-control)
   Memory: 51.0M
   CGroup: /system.slice/flocker-control.service
           └─31719 /opt/flocker/bin/python /usr/sbin/flocker-control --port tcp:4523 --agent-port tcp:4524 --journald

Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"timestamp": 1471888441.564769, "task_uuid": "84a91d1f-87c2-410a-ae3d-e3f04d2264a1", "action_type": "eliot:remote_task", "action_status": "started", "task_level": [7080, 2, 1]}
Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"task_uuid": "84a91d1f-87c2-410a-ae3d-e3f04d2264a1", "task_level": [7080, 2, 2, 1], "action_type": "api:request", "timestamp": 1471888441.566405, "method": "GET", "action_status": "started", "request_path": "/v1/state/nodes/by_era/3c3950f1-67d4-4f6a-9a02-fc42b096f31d"}
Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"timestamp": 1471888441.568236, "task_uuid": "84a91d1f-87c2-410a-ae3d-e3f04d2264a1", "action_type": "api:request", "action_status": "succeeded", "task_level": [7080, 2, 2, 3]}
Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"timestamp": 1471888441.569054, "task_uuid": "84a91d1f-87c2-410a-ae3d-e3f04d2264a1", "action_type": "eliot:remote_task", "action_status": "succeeded", "task_level": [7080, 2, 3]}
Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"task_uuid": "6a319102-eca2-40b4-829b-982bb9c8a775", "error": false, "timestamp": 1471888441.571276, "message": "\"10.60.120.164\" - - [22/Aug/2016:17:54:00 +0000] \"GET /v1/state/nodes/by_era/3c3950f1-67d4-4f6a-9a02-fc42b096f31d HTTP/1.1\" 404 48 \"-\" \"-\"", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"timestamp": 1471888441.665033, "task_uuid": "4e9718b0-90c1-48d2-94a8-24da56d2198d", "action_type": "eliot:remote_task", "action_status": "started", "task_level": [6984, 2, 1]}
Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"task_uuid": "4e9718b0-90c1-48d2-94a8-24da56d2198d", "task_level": [6984, 2, 2, 1], "action_type": "api:request", "timestamp": 1471888441.665807, "method": "GET", "action_status": "started", "request_path": "/v1/state/nodes/by_era/de9ace0f-2be8-4c09-ba41-fa20d61fa5f6"}
Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"timestamp": 1471888441.667345, "task_uuid": "4e9718b0-90c1-48d2-94a8-24da56d2198d", "action_type": "api:request", "action_status": "succeeded", "task_level": [6984, 2, 2, 3]}
Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"timestamp": 1471888441.66815, "task_uuid": "4e9718b0-90c1-48d2-94a8-24da56d2198d", "action_type": "eliot:remote_task", "action_status": "succeeded", "task_level": [6984, 2, 3]}
Aug 22 13:54:01 mmaster1 flocker-control[31719]: {"task_uuid": "a181a74a-2afc-49f1-bada-99f6df741b5a", "error": false, "timestamp": 1471888441.670334, "message": "\"10.60.120.161\" - - [22/Aug/2016:17:54:00 +0000] \"GET /v1/state/nodes/by_era/de9ace0f-2be8-4c09-ba41-fa20d61fa5f6 HTTP/1.1\" 404 48 \"-\" \"-\"", "message_type": "twisted:log", "task_level": [1]}
[root@mmaster1 new.cert]# 

Issue a command to list nodes:

[root@mmaster1 new.cert]# flockerctl --control-service=192.168.120.156 list-nodes
SERVER   ADDRESS 

[root@mmaster1 new.cert]# 

As you can see, no nodes listed. All my agent nodes are up, with Flocker services also up:

Example:

Display Flocker-Container-Agent service:

[root@mslave3 ~]# systemctl -l status flocker-container-agent.service 
● flocker-container-agent.service - Flocker Container Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-container-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 13:07:28 EDT; 49min ago
 Main PID: 29010 (flocker-contain)
   Memory: 60.1M
   CGroup: /system.slice/flocker-container-agent.service
           └─29010 /opt/flocker/bin/python /usr/sbin/flocker-container-agent --journald

Aug 22 13:56:40 mslave3 flocker-container-agent[29010]: {"task_uuid": "167f863d-0074-4a5e-a07a-d596570f75d6", "error": false, "timestamp": 1471888600.981621, "message": "Starting factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x38d3248>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:56:40 mslave3 flocker-container-agent[29010]: {"task_uuid": "0462565f-25d9-442f-9891-8d586e767aca", "error": false, "timestamp": 1471888600.985041, "message": "AgentAMP connection established (HOST:IPv4Address(TCP, '10.60.120.163', 51411) PEER:IPv4Address(TCP, '10.60.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:56:40 mslave3 flocker-container-agent[29010]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x38d0590>", "fsm_input": "<ClusterStatusInputs=CONNECTED_TO_CONTROL_SERVICE>", "timestamp": 1471888600.987316, "fsm_rich_input": "<_ConnectedToControlService>", "action_status": "started", "task_uuid": "fde19cb2-25c1-4867-a9a5-3357c76123d6", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [1]}
Aug 22 13:56:40 mslave3 flocker-container-agent[29010]: {"fsm_next_state": "<ClusterStatusStates=IGNORANT>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471888600.988204, "fsm_output": ["<ClusterStatusOutputs=STORE_CLIENT>"], "task_uuid": "fde19cb2-25c1-4867-a9a5-3357c76123d6", "action_status": "succeeded"}
Aug 22 13:56:41 mslave3 flocker-container-agent[29010]: {"task_uuid": "25b95a01-dd3c-4511-a86e-6e1ac1c95125", "error": false, "timestamp": 1471888601.120226, "message": "AgentAMP connection lost (HOST:IPv4Address(TCP, '10.60.120.163', 51411) PEER:IPv4Address(TCP, '10.60.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:56:41 mslave3 flocker-container-agent[29010]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471888601.121666, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "9abbc240-5002-42c4-9f9c-9e4d749e5115", "task_level": [1]}
Aug 22 13:56:41 mslave3 flocker-container-agent[29010]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x38d0590>", "fsm_input": "<ClusterStatusInputs=DISCONNECTED_FROM_CONTROL_SERVICE>", "timestamp": 1471888601.123627, "fsm_rich_input": null, "action_status": "started", "task_uuid": "76109abc-be48-4ce7-8c2f-89f4ae415589", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=IGNORANT>", "task_level": [1]}
Aug 22 13:56:41 mslave3 flocker-container-agent[29010]: {"fsm_next_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471888601.124578, "fsm_output": [], "task_uuid": "76109abc-be48-4ce7-8c2f-89f4ae415589", "action_status": "succeeded"}
Aug 22 13:56:41 mslave3 flocker-container-agent[29010]: {"task_uuid": "eb882355-c912-4d5d-90ee-697dca3a5a8c", "error": false, "timestamp": 1471888601.125603, "message": "<twisted.internet.tcp.Connector instance at 0x38cdcf8> will retry in 3 seconds", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:56:41 mslave3 flocker-container-agent[29010]: {"task_uuid": "53fd1f92-6222-43e8-a90f-f94fa3f677f7", "error": false, "timestamp": 1471888601.126927, "message": "Stopping factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x38d3248>", "message_type": "twisted:log", "task_level": [1]}
[root@mslave3 ~]#
Display Flocker-Dataset-Agent service:

[root@mslave3 ~]# systemctl -l status flocker-dataset-agent.service
● flocker-dataset-agent.service - Flocker Dataset Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-dataset-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 13:07:25 EDT; 50min ago
 Main PID: 28999 (flocker-dataset)
   Memory: 64.9M
   CGroup: /system.slice/flocker-dataset-agent.service
           └─28999 /opt/flocker/bin/python /usr/sbin/flocker-dataset-agent --journald

Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"task_uuid": "1bf9cd18-7c6d-4de1-b549-133950d7abc7", "error": false, "timestamp": 1471888648.734586, "message": "Starting factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x7f214803ce18>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"task_uuid": "cace071b-d572-46e5-ae6c-14c4d5e1ada5", "error": false, "timestamp": 1471888648.738404, "message": "AgentAMP connection established (HOST:IPv4Address(TCP, '192.168.120.163', 51629) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x3a8e890>", "fsm_input": "<ClusterStatusInputs=CONNECTED_TO_CONTROL_SERVICE>", "timestamp": 1471888648.741173, "fsm_rich_input": "<_ConnectedToControlService>", "action_status": "started", "task_uuid": "0b321236-fd0f-40ec-86b9-95b470078f50", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [1]}
Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"fsm_next_state": "<ClusterStatusStates=IGNORANT>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471888648.742096, "fsm_output": ["<ClusterStatusOutputs=STORE_CLIENT>"], "task_uuid": "0b321236-fd0f-40ec-86b9-95b470078f50", "action_status": "succeeded"}
Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"task_uuid": "18fa446d-d46f-4f04-8a22-772c4acb057a", "error": false, "timestamp": 1471888648.84152, "message": "AgentAMP connection lost (HOST:IPv4Address(TCP, '192.168.120.163', 51629) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471888648.842963, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "5a5d13e1-7d6b-4a6e-989d-a4e618c00e71", "task_level": [1]}
Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x3a8e890>", "fsm_input": "<ClusterStatusInputs=DISCONNECTED_FROM_CONTROL_SERVICE>", "timestamp": 1471888648.844714, "fsm_rich_input": null, "action_status": "started", "task_uuid": "d4bcefc0-1877-4600-aaa3-31c97579d122", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=IGNORANT>", "task_level": [1]}
Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"fsm_next_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471888648.845755, "fsm_output": [], "task_uuid": "d4bcefc0-1877-4600-aaa3-31c97579d122", "action_status": "succeeded"}
Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"task_uuid": "a83ff69d-9ed8-435e-a5c3-7fb8ab6773d8", "error": false, "timestamp": 1471888648.846986, "message": "<twisted.internet.tcp.Connector instance at 0x390e1b8> will retry in 2 seconds", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:57:28 mslave3 flocker-dataset-agent[28999]: {"task_uuid": "4053b18d-f319-4ef2-8510-2c722307d808", "error": false, "timestamp": 1471888648.848453, "message": "Stopping factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x7f214803ce18>", "message_type": "twisted:log", "task_level": [1]}
[root@mslave3 ~]# 

Display Flocker-Docker plugin service:

[root@mslave3 ~]# systemctl -l status flocker-docker-plugin.service 
● flocker-docker-plugin.service - Flocker Docker Plugin
   Loaded: loaded (/usr/lib/systemd/system/flocker-docker-plugin.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 13:07:42 EDT; 50min ago
 Main PID: 29093 (flocker-docker-)
   Memory: 87.9M
   CGroup: /system.slice/flocker-docker-plugin.service
           └─29093 /opt/flocker/bin/python /usr/sbin/flocker-docker-plugin --journald

Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"exception": "flocker.apiclient._client.NotFound", "task_level": [7956, 4], "action_type": "flocker:apiclient:http_request", "reason": "{\"description\": \"No node found with given era.\"}", "timestamp": 1471888697.090561, "task_uuid": "80a5fe91-1be8-47b2-99ff-15bef8ae4f59", "action_status": "failed"}
Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"task_uuid": "c01669b8-222a-41fd-8135-eb169486818f", "error": false, "timestamp": 1471888697.093445, "message": "Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x618c5a8>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"request_body": null, "url": "https://10.60.120.156:4523/v1/state/nodes/by_era/4d004c39-c4cf-4220-87cb-f62f92e42899", "timestamp": 1471888697.191925, "action_status": "started", "task_uuid": "80a5fe91-1be8-47b2-99ff-15bef8ae4f59", "action_type": "flocker:apiclient:http_request", "method": "GET", "task_level": [7957, 1]}
Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"task_uuid": "80a5fe91-1be8-47b2-99ff-15bef8ae4f59", "error": false, "timestamp": 1471888697.204047, "message": "Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x6247908>", "message_type": "twisted:log", "task_level": [7957, 3]}
Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"exception": "flocker.apiclient._client.NotFound", "task_level": [7957, 4], "action_type": "flocker:apiclient:http_request", "reason": "{\"description\": \"No node found with given era.\"}", "timestamp": 1471888697.428393, "task_uuid": "80a5fe91-1be8-47b2-99ff-15bef8ae4f59", "action_status": "failed"}
Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"task_uuid": "48fd495a-7794-4453-bda7-219763abd71e", "error": false, "timestamp": 1471888697.431571, "message": "Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x6247908>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"request_body": null, "url": "https://10.60.120.156:4523/v1/state/nodes/by_era/4d004c39-c4cf-4220-87cb-f62f92e42899", "timestamp": 1471888697.529721, "action_status": "started", "task_uuid": "80a5fe91-1be8-47b2-99ff-15bef8ae4f59", "action_type": "flocker:apiclient:http_request", "method": "GET", "task_level": [7958, 1]}
Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"task_uuid": "80a5fe91-1be8-47b2-99ff-15bef8ae4f59", "error": false, "timestamp": 1471888697.542891, "message": "Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x6166f38>", "message_type": "twisted:log", "task_level": [7958, 3]}
Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"exception": "flocker.apiclient._client.NotFound", "task_level": [7958, 4], "action_type": "flocker:apiclient:http_request", "reason": "{\"description\": \"No node found with given era.\"}", "timestamp": 1471888697.778659, "task_uuid": "80a5fe91-1be8-47b2-99ff-15bef8ae4f59", "action_status": "failed"}
Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"task_uuid": "b3f691d3-4ea1-4c94-bce7-0960ee823b8f", "error": false, "timestamp": 1471888697.781484, "message": "Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x6166f38>", "message_type": "twisted:log", "task_level": [1]}
[root@mslave3 ~]# 
akamalov commented 8 years ago

Displaying iptables rules on a master:

[root@mmaster1 new.cert]# iptables -L --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:4523

Chain FORWARD (policy ACCEPT)
num  target     prot opt source               destination         
1    DOCKER-ISOLATION  all  --  anywhere             anywhere            
2    DOCKER     all  --  anywhere             anywhere            
3    ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
4    ACCEPT     all  --  anywhere             anywhere            
5    ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
num  target     prot opt source               destination         

Chain DOCKER (1 references)
num  target     prot opt source               destination         

Chain DOCKER-ISOLATION (1 references)
num  target     prot opt source               destination         
1    RETURN     all  --  anywhere             anywhere            
[root@mmaster1 new.cert]# 

On an agent node, completely flushed iptables rules:

[root@mslave3 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
[root@mslave3 ~]# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
[root@mslave3 ~]# 
wallnerryan commented 8 years ago

@akamalov since you are not seeing errors with the flockerctl list-nodes command this is telling me it can communicate with the control service as @wallrj was stating. You are not seeing the nodes because they cannot communicate with the control service. Did you re-create the node certificates for each agent after you changed the control certificate to use the ip? Also make sure the new cluster.crt makes it to the new dataset agent nodes.

akamalov commented 8 years ago

@wallnerryan Yes I have. Certs have been re-generated, distributed among agent nodes and services restarted.

Don't know if it'll help, but I am seeing this:

Aug 22 13:58:17 mslave3 flocker-docker-plugin[29093]: {"exception": "flocker.apiclient._client.NotFound", "task_level": [7957, 4], "action_type": "flocker:apiclient:http_request", "reason": "{\"description\": \"No node found with given era.\"}", "timestamp": 1471888697.428393, "task_uuid": "80a5fe91-1be8-47b2-99ff-15bef8ae4f59", "action_status": "failed"}

akamalov commented 8 years ago

Thanks for editing it out @wallrj

akamalov commented 8 years ago

Step-by-step process on how cert was generated (one node, for testing):

Generate Cert for host - 'mslave3':

[root@mmaster1 new.cert]# /opt/flocker/bin/flocker-ca create-node-certificate
Created dc656cb5-ca81-4e8c-95ec-5714f69a01aa.crt. Copy it over to /etc/flocker/node.crt on your node machine and make sure to chmod 0600 it.
[root@mmaster1 new.cert]#

Rename certs to hostname of the node (for now):

[root@mmaster1 new.cert]# mv dc656cb5-ca81-4e8c-95ec-5714f69a01aa.crt mslave3.crt
mv: overwrite ‘mslave3.crt’? y
[root@mmaster1 new.cert]# mv dc656cb5-ca81-4e8c-95ec-5714f69a01aa.key mslave3.crt
mv: overwrite ‘mslave3.crt’? y

SCP certs over to the destination agent node - 'mslave3':

[root@mmaster1 new.cert]# scp mslave3.* root@mslave3:/etc/flocker/

On agent node - 'mslave3', rename destination keys in /etc/flocker directory:

[root@mslave3 flocker]# pwd
/etc/flocker
[root@mslave3 flocker]# 

[root@mslave3 flocker]# mv mslave3.crt node.crt
mv: overwrite ‘node.crt’? y
[root@mslave3 flocker]# mv mslave3.key node.key
mv: overwrite ‘node.key’? y
[root@mslave3 flocker]#

Restart Flocker services:

systemctl restart flocker-dataset-agent
systemctl restart flocker-container-agent
systemctl restart flocker-docker-plugin

Display status:


##########################

● flocker-container-agent.service - Flocker Container Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-container-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 14:24:01 EDT; 217ms ago
 Main PID: 13947 (flocker-contain)
   Memory: 10.5M
   CGroup: /system.slice/flocker-container-agent.service
           └─13947 /opt/flocker/bin/python /usr/sbin/flocker-container-agent --journald

Aug 22 14:24:01 mslave3 systemd[1]: Started Flocker Container Agent.
Aug 22 14:24:01 mslave3 systemd[1]: Starting Flocker Container Agent...

##############################

● flocker-dataset-agent.service - Flocker Dataset Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-dataset-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 14:24:01 EDT; 445ms ago
 Main PID: 13939 (flocker-dataset)
   Memory: 18.9M
   CGroup: /system.slice/flocker-dataset-agent.service
           └─13939 /opt/flocker/bin/python /usr/sbin/flocker-dataset-agent --journald

Aug 22 14:24:01 mslave3 systemd[1]: Started Flocker Dataset Agent.
Aug 22 14:24:01 mslave3 systemd[1]: Starting Flocker Dataset Agent...
##############################

● flocker-docker-plugin.service - Flocker Docker Plugin
   Loaded: loaded (/usr/lib/systemd/system/flocker-docker-plugin.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 14:24:01 EDT; 24ms ago
 Main PID: 13962 (flocker-docker-)
   Memory: 3.0M
   CGroup: /system.slice/flocker-docker-plugin.service
           └─13962 /opt/flocker/bin/python /usr/sbin/flocker-docker-plugin --journald

Aug 22 14:24:01 mslave3 systemd[1]: Started Flocker Docker Plugin.
Aug 22 14:24:01 mslave3 systemd[1]: Starting Flocker Docker Plugin...
##############################

List nodes from master:

[root@mmaster1 new.cert]# flockerctl --control-service=192.168.120.156 list-nodes
SERVER   ADDRESS 

[root@mmaster1 new.cert]# 

No nodes displayed...

wallnerryan commented 8 years ago

the docker plugin error is just because the nodes cannot be found still. Please dont just regenerate only the node certs. Each cert depends on the other. Start from a clean slate and regenerate ALL certificates and clean out all certs from /etc/flocker on all nodes.

akamalov commented 8 years ago

Copy. I will do it right now and pot step-by-step process... Thanks @wallnerryan

akamalov commented 8 years ago

@wallnerryan

Step-by-step process on creating certs, distributing them to agent nodes, and etc..

Initialize cluster:

[root@mmaster1 new.cert]# /opt/flocker/bin/flocker-ca initialize flocker
Created cluster.key and cluster.crt. Please keep cluster.key secret, as anyone who can access it will be able to control your cluster.

Display what has been generated:

[root@mmaster1 new.cert]# ls -al cluster*
-rw------- 1 root root 1948 Aug 22 14:43 cluster.crt
-rw------- 1 root root 3272 Aug 22 14:43 cluster.key
[root@mmaster1 new.cert]# 

Generate certs for FCS:

[root@mmaster1 new.cert]# /opt/flocker/bin/flocker-ca create-control-certificate 192.168.120.156
Created control-192.168.120.156.crt and control-192.168.120.156.key
Copy these files to the directory /etc/flocker on your control service machine.
Rename the files to control-service.crt and control-service.key and set the correct permissions by running chmod 0600 on both files.
[root@mmaster1 new.cert]# 

Display generated certificates:

[root@mmaster1 new.cert]# ls -al
total 20
drwxr-xr-x   2 root root  106 Aug 22 14:44 .
dr-xr-x---. 14 root root 4096 Aug 22 14:43 ..
-rw-------   1 root root 1948 Aug 22 14:43 cluster.crt
-rw-------   1 root root 3272 Aug 22 14:43 cluster.key
-rw-------   1 root root 1870 Aug 22 14:44 control-192.168.120.156.crt
-rw-------   1 root root 3272 Aug 22 14:44 control-192.168.120.156.key
[root@mmaster1 new.cert]# 

Rename certificates:

[root@mmaster1 new.cert]# mv control-192.168.120.156.crt control-service.crt
[root@mmaster1 new.cert]# mv control-192.168.120.156.key control-service.key 

Move certificates to /etc/flocker/:

[root@mmaster1 new.cert]# cp cluster.crt  /etc/flocker/
[root@mmaster1 new.cert]# mv control-service.* /etc/flocker

Display contents of /etc/flocker/:

[root@mmaster1 new.cert]# ls -al /etc/flocker
total 24
drwx------    2 root root   76 Aug 22 14:46 .
drwxr-xr-x. 114 root root 8192 Aug 18 08:03 ..
-rw-------    1 root root 1948 Aug 22 14:45 cluster.crt
-rw-------    1 root root 1870 Aug 22 14:44 control-service.crt
-rw-------    1 root root 3272 Aug 22 14:44 control-service.key
[root@mmaster1 new.cert]# 

Set permissions (again):

[root@mmaster1 new.cert]# chmod 0700 /etc/flocker
[root@mmaster1 new.cert]# chmod 0600 /etc/flocker/control-service.key

Generate certs for five agent nodes:

[root@mmaster1 new.cert]# for i in 1 2 3 4 5; do /opt/flocker/bin/flocker-ca create-node-certificate; done
Created fb2b98cd-fe18-4ab3-9692-70f3609f1671.crt. Copy it over to /etc/flocker/node.crt on your node machine and make sure to chmod 0600 it.
Created 1b8bd056-2841-4e5a-8bc8-a11f888e0b8f.crt. Copy it over to /etc/flocker/node.crt on your node machine and make sure to chmod 0600 it.
Created 06978551-c7ee-41f6-8412-42f119777d04.crt. Copy it over to /etc/flocker/node.crt on your node machine and make sure to chmod 0600 it.
Created 701fdaad-01ec-478e-b2fb-7ec15b14cc9d.crt. Copy it over to /etc/flocker/node.crt on your node machine and make sure to chmod 0600 it.
Created 4d35f998-ea9d-4f64-af20-5edbce48aa25.crt. Copy it over to /etc/flocker/node.crt on your node machine and make sure to chmod 0600 it.
[root@mmaster1 new.cert]# 

Display certificates:

[root@mmaster1 new.cert]# ls -al
total 56
drwxr-xr-x   2 root root 4096 Aug 22 14:48 .
dr-xr-x---. 14 root root 4096 Aug 22 14:43 ..
-rw-------   1 root root 1854 Aug 22 14:48 06978551-c7ee-41f6-8412-42f119777d04.crt
-rw-------   1 root root 3276 Aug 22 14:48 06978551-c7ee-41f6-8412-42f119777d04.key
-rw-------   1 root root 1854 Aug 22 14:48 1b8bd056-2841-4e5a-8bc8-a11f888e0b8f.crt
-rw-------   1 root root 3272 Aug 22 14:48 1b8bd056-2841-4e5a-8bc8-a11f888e0b8f.key
-rw-------   1 root root 1854 Aug 22 14:48 4d35f998-ea9d-4f64-af20-5edbce48aa25.crt
-rw-------   1 root root 3272 Aug 22 14:48 4d35f998-ea9d-4f64-af20-5edbce48aa25.key
-rw-------   1 root root 1854 Aug 22 14:48 701fdaad-01ec-478e-b2fb-7ec15b14cc9d.crt
-rw-------   1 root root 3272 Aug 22 14:48 701fdaad-01ec-478e-b2fb-7ec15b14cc9d.key
-rw-------   1 root root 1948 Aug 22 14:43 cluster.crt
-rw-------   1 root root 3272 Aug 22 14:43 cluster.key
-rw-------   1 root root 1854 Aug 22 14:48 fb2b98cd-fe18-4ab3-9692-70f3609f1671.crt
-rw-------   1 root root 3268 Aug 22 14:48 fb2b98cd-fe18-4ab3-9692-70f3609f1671.key

Rename certs after hostnames:

[root@mmaster1 new.cert]# mv 06978551-c7ee-41f6-8412-42f119777d04.crt mslave1.crt
[root@mmaster1 new.cert]# mv 06978551-c7ee-41f6-8412-42f119777d04.key mslave1.key

[root@mmaster1 new.cert]# mv 1b8bd056-2841-4e5a-8bc8-a11f888e0b8f.crt mslave2.crt
[root@mmaster1 new.cert]# mv 1b8bd056-2841-4e5a-8bc8-a11f888e0b8f.key mslave2.key

[root@mmaster1 new.cert]# mv 4d35f998-ea9d-4f64-af20-5edbce48aa25.crt mslave3.crt
[root@mmaster1 new.cert]# mv 4d35f998-ea9d-4f64-af20-5edbce48aa25.key mslave3.key

[root@mmaster1 new.cert]# mv 701fdaad-01ec-478e-b2fb-7ec15b14cc9d.crt mslave4.crt
[root@mmaster1 new.cert]# mv 701fdaad-01ec-478e-b2fb-7ec15b14cc9d.key mslave4.key

[root@mmaster1 new.cert]# mv fb2b98cd-fe18-4ab3-9692-70f3609f1671.crt mslave5.crt
[root@mmaster1 new.cert]# mv fb2b98cd-fe18-4ab3-9692-70f3609f1671.key mslave5.key

Copy certs to each agent, including cluster.crt file:

[root@mmaster1 new.cert]# for i in 1 2 3 4 5; do scp cluster.crt root@mslave$i:/etc/flocker; done
[root@mmaster1 new.cert]# scp mslave1.crt root@mslave1:/etc/flocker/node.crt
[root@mmaster1 new.cert]# scp mslave2.crt root@mslave2:/etc/flocker/node.crt
[root@mmaster1 new.cert]# scp mslave3.crt root@mslave3:/etc/flocker/node.crt
[root@mmaster1 new.cert]# scp mslave4.crt root@mslave4:/etc/flocker/node.crt
[root@mmaster1 new.cert]# scp mslave5.crt root@mslave5:/etc/flocker/node.crt
[root@mmaster1 new.cert]# scp mslave1.key root@mslave1:/etc/flocker/node.key
[root@mmaster1 new.cert]# scp mslave2.key root@mslave2:/etc/flocker/node.key
[root@mmaster1 new.cert]# scp mslave3.key root@mslave3:/etc/flocker/node.key
[root@mmaster1 new.cert]# scp mslave4.key root@mslave4:/etc/flocker/node.key
[root@mmaster1 new.cert]# scp mslave5.key root@mslave5:/etc/flocker/node.key

Display agent.yml file:

"version": 1
"control-service": 
  "hostname": 192.168.120.156 
  "port": 4523
"dataset": 
    "backend": "purestorage_flasharray_flocker_driver"
    "pure_ip": 172.16.128.157 
    "pure_api_token": "50cac451-fefe-b635-9692-5870aada9c49"
    "pure_chap_host_user": "XXXXX"
    "pure_chap_host_password": "XXXXX"

Copy agent.yml file on each of the Agent nodes:

[root@mmaster1 new.cert]# for i in 1 2 3 4 5; do scp agent.yml root@mslave$i:/etc/flocker/; done

Generate plugin cert:

[root@mmaster1 new.cert]# /opt/flocker/bin/flocker-ca create-api-certificate plugin
Created plugin.crt. You can now give it to your API enduser so they can access the control service API.
[root@mmaster1 new.cert]# 

Copy Plugin cert to the target agent nodes:

[root@mmaster1 new.cert]# for i in 1 2 3 4 5; do scp plugin.* root@mslave$i:/etc/flocker/; done

Generate an API client certificate and rename it:

[root@mmaster1 new.cert]#  /opt/flocker/bin/flocker-ca create-api-certificate flocker-akamalov
Created flocker-akamalov.crt. You can now give it to your API enduser so they can access the control service API.
[root@mmaster1 new.cert]# mv flocker-akamalov.crt user.crt
[root@mmaster1 new.cert]# mv flocker-akamalov.key user.key
[root@mmaster1 new.cert]# 

Restart FCS services:

[root@mmaster1 new.cert]# systemctl restart flocker-control
[root@mmaster1 new.cert]# systemctl -l status flocker-control
● flocker-control.service - Flocker Control Service
   Loaded: loaded (/usr/lib/systemd/system/flocker-control.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 14:56:00 EDT; 6s ago
 Main PID: 22394 (flocker-control)
   Memory: 38.4M
   CGroup: /system.slice/flocker-control.service
           └─22394 /opt/flocker/bin/python /usr/sbin/flocker-control --port tcp:4523 --agent-port tcp:4524 --journald

Aug 22 14:56:00 mmaster1 systemd[1]: Started Flocker Control Service.
Aug 22 14:56:00 mmaster1 systemd[1]: Starting Flocker Control Service...
Aug 22 14:56:01 mmaster1 flocker-control[22394]: {"task_uuid": "806bacd5-0b91-483e-9a86-d2e5e18c3350", "error": false, "timestamp": 1471892161.268254, "message": "Log opened.", "message_type": "twisted:log", "task_level": [1]}
Aug 22 14:56:01 mmaster1 flocker-control[22394]: {"timestamp": 1471892161.310025, "task_uuid": "f2d840d7-14a9-40c1-a7f0-bc5b79bb864b", "configuration": {"persistent_state": {"blockdevice_ownership": {"values": [], "$__class__$": "PMap"}, "$__class__$": "PersistentState"}, "nodes": {"values": [], "$__class__$": "PMap"}, "leases": {"values": [], "$__class__$": "PMap"}, "$__class__$": "Deployment"}, "message_type": "flocker-control:persistence:startup", "task_level": [1]}
Aug 22 14:56:01 mmaster1 flocker-control[22394]: {"task_uuid": "adca8668-18b4-4d5f-8ac2-89afbe3cf026", "error": false, "timestamp": 1471892161.3139, "message": "Site (TLS) starting on 4523", "message_type": "twisted:log", "task_level": [1]}
Aug 22 14:56:01 mmaster1 flocker-control[22394]: {"task_uuid": "d51b7e48-943f-4c0d-8c8e-17583de6bc74", "error": false, "timestamp": 1471892161.314424, "message": "Starting factory <twisted.web.server.Site instance at 0x27db998>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 14:56:01 mmaster1 flocker-control[22394]: {"task_uuid": "cf732101-f7c8-4305-94ad-ae9afef7441e", "error": false, "timestamp": 1471892161.315766, "message": "ServerFactory (TLS) starting on 4524", "message_type": "twisted:log", "task_level": [1]}
Aug 22 14:56:01 mmaster1 flocker-control[22394]: {"task_uuid": "b8130a8a-d2f1-411c-aeea-18dea1323f5e", "error": false, "timestamp": 1471892161.316458, "message": "Starting factory <twisted.internet.protocol.ServerFactory instance at 0x27dba70>", "message_type": "twisted:log", "task_level": [1]}
[root@mmaster1 new.cert]# 

Restart Agent node services:

systemctl restart flocker-dataset-agent
systemctl restart flocker-container-agent
systemctl restart flocker-docker-plugin

Sample output of status of running services in Agent node (just node mslave1, as an example):

[root@mslave1 ~]# systemctl -l status flocker-container-agent 
● flocker-container-agent.service - Flocker Container Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-container-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 15:00:59 EDT; 15min ago
 Main PID: 26448 (flocker-contain)
   Memory: 56.5M
   CGroup: /system.slice/flocker-container-agent.service
           └─26448 /opt/flocker/bin/python /usr/sbin/flocker-container-agent --journald

Aug 22 15:16:03 mslave1 flocker-container-agent[26448]: {"task_uuid": "f6493366-db73-4932-b1db-958be0828c98", "error": false, "timestamp": 1471893363.944895, "message": "Starting factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x3fcb290>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:03 mslave1 flocker-container-agent[26448]: {"task_uuid": "db7fbca6-50ac-4978-b04a-ac513de95c8c", "error": false, "timestamp": 1471893363.948004, "message": "AgentAMP connection established (HOST:IPv4Address(TCP, '192.168.120.161', 45827) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:03 mslave1 flocker-container-agent[26448]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x3fc85d0>", "fsm_input": "<ClusterStatusInputs=CONNECTED_TO_CONTROL_SERVICE>", "timestamp": 1471893363.948953, "fsm_rich_input": "<_ConnectedToControlService>", "action_status": "started", "task_uuid": "88144288-f74f-4704-9f14-ab025abcf454", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [1]}
Aug 22 15:16:03 mslave1 flocker-container-agent[26448]: {"fsm_next_state": "<ClusterStatusStates=IGNORANT>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471893363.949327, "fsm_output": ["<ClusterStatusOutputs=STORE_CLIENT>"], "task_uuid": "88144288-f74f-4704-9f14-ab025abcf454", "action_status": "succeeded"}
Aug 22 15:16:04 mslave1 flocker-container-agent[26448]: {"task_uuid": "4bc69053-2d7b-45d6-ae22-81244df99ad2", "error": false, "timestamp": 1471893364.110368, "message": "AgentAMP connection lost (HOST:IPv4Address(TCP, '192.168.120.161', 45827) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:04 mslave1 flocker-container-agent[26448]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471893364.111283, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "859ae49a-403b-469c-b47d-2e242b042be6", "task_level": [1]}
Aug 22 15:16:04 mslave1 flocker-container-agent[26448]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x3fc85d0>", "fsm_input": "<ClusterStatusInputs=DISCONNECTED_FROM_CONTROL_SERVICE>", "timestamp": 1471893364.112468, "fsm_rich_input": null, "action_status": "started", "task_uuid": "3cd10e97-6772-41f1-bb8e-b23332397a9d", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=IGNORANT>", "task_level": [1]}
Aug 22 15:16:04 mslave1 flocker-container-agent[26448]: {"fsm_next_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471893364.113232, "fsm_output": [], "task_uuid": "3cd10e97-6772-41f1-bb8e-b23332397a9d", "action_status": "succeeded"}
Aug 22 15:16:04 mslave1 flocker-container-agent[26448]: {"task_uuid": "28b25663-efbc-4588-a314-9a3a9b2b1f42", "error": false, "timestamp": 1471893364.114035, "message": "<twisted.internet.tcp.Connector instance at 0x3fc5d40> will retry in 2 seconds", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:04 mslave1 flocker-container-agent[26448]: {"task_uuid": "c3d93755-222f-4efb-b635-4e0f877fca56", "error": false, "timestamp": 1471893364.114793, "message": "Stopping factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x3fcb290>", "message_type": "twisted:log", "task_level": [1]}
[root@mslave1 ~]# systemctl -l status flocker-dataset-agent
● flocker-dataset-agent.service - Flocker Dataset Agent
   Loaded: loaded (/usr/lib/systemd/system/flocker-dataset-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 15:00:59 EDT; 15min ago
 Main PID: 26418 (flocker-dataset)
   Memory: 58.6M
   CGroup: /system.slice/flocker-dataset-agent.service
           └─26418 /opt/flocker/bin/python /usr/sbin/flocker-dataset-agent --journald

Aug 22 15:16:06 mslave1 flocker-dataset-agent[26418]: {"task_uuid": "5e7a601f-2164-4259-ba43-022db314cf3a", "error": false, "timestamp": 1471893366.784998, "message": "AgentAMP connection lost (HOST:IPv4Address(TCP, '192.168.120.161', 45839) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:06 mslave1 flocker-dataset-agent[26418]: {"exception": "OpenSSL.SSL.Error", "reason": "[('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]", "timestamp": 1471893366.787609, "traceback": "Traceback: <class 'OpenSSL.SSL.Error'>: [('SSL routines', 'SSL3_READ_BYTES', 'sslv3 alert certificate unknown'), ('SSL routines', 'SSL3_WRITE_BYTES', 'ssl handshake failure')]\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/posixbase.py:597:_doReadOrWrite\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:209:doRead\n/opt/flocker/lib/python2.7/site-packages/twisted/internet/tcp.py:215:_dataReceived\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:421:dataReceived\n--- <exception caught here> ---\n/opt/flocker/lib/python2.7/site-packages/twisted/protocols/tls.py:569:_write\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1271:send\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/SSL.py:1191:_raise_ssl_error\n/opt/flocker/lib/python2.7/site-packages/OpenSSL/_util.py:48:exception_from_error_queue\n", "message_type": "eliot:traceback", "task_uuid": "82a1bc07-467b-465f-bb95-8a9e11e7e778", "task_level": [1]}
Aug 22 15:16:06 mslave1 flocker-dataset-agent[26418]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x47fc390>", "fsm_input": "<ClusterStatusInputs=DISCONNECTED_FROM_CONTROL_SERVICE>", "timestamp": 1471893366.788813, "fsm_rich_input": null, "action_status": "started", "task_uuid": "2ab03085-735d-4965-ae30-0fb25dabf305", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=IGNORANT>", "task_level": [1]}
Aug 22 15:16:06 mslave1 flocker-dataset-agent[26418]: {"fsm_next_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471893366.789513, "fsm_output": [], "task_uuid": "2ab03085-735d-4965-ae30-0fb25dabf305", "action_status": "succeeded"}
Aug 22 15:16:06 mslave1 flocker-dataset-agent[26418]: {"task_uuid": "91f3d151-80a5-4c39-a5fe-adef810950d7", "error": false, "timestamp": 1471893366.790307, "message": "<twisted.internet.tcp.Connector instance at 0x4672488> will retry in 2 seconds", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:06 mslave1 flocker-dataset-agent[26418]: {"task_uuid": "ece57e60-28f6-4569-9d55-823cf0c653b4", "error": false, "timestamp": 1471893366.791074, "message": "Stopping factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x7ff30c03cdd0>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:09 mslave1 flocker-dataset-agent[26418]: {"task_uuid": "d27c3d2a-3ebe-4b90-95f1-6309d6f7e09e", "error": false, "timestamp": 1471893369.615869, "message": "Starting factory <twisted.internet.protocol.ReconnectingClientFactory instance at 0x7ff30c03cdd0>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:09 mslave1 flocker-dataset-agent[26418]: {"task_uuid": "2213582b-7966-4bf8-aa3b-e0ed8e820e60", "error": false, "timestamp": 1471893369.618962, "message": "AgentAMP connection established (HOST:IPv4Address(TCP, '192.168.120.161', 45853) PEER:IPv4Address(TCP, '192.168.120.156', 4523))", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:09 mslave1 flocker-dataset-agent[26418]: {"fsm_identifier": "<flocker.node._loop.ClusterStatus object at 0x47fc390>", "fsm_input": "<ClusterStatusInputs=CONNECTED_TO_CONTROL_SERVICE>", "timestamp": 1471893369.620741, "fsm_rich_input": "<_ConnectedToControlService>", "action_status": "started", "task_uuid": "4c9a9de5-22fc-4e81-bf02-c59231913282", "action_type": "fsm:transition", "fsm_state": "<ClusterStatusStates=DISCONNECTED>", "task_level": [1]}
Aug 22 15:16:09 mslave1 flocker-dataset-agent[26418]: {"fsm_next_state": "<ClusterStatusStates=IGNORANT>", "task_level": [2], "action_type": "fsm:transition", "timestamp": 1471893369.621449, "fsm_output": ["<ClusterStatusOutputs=STORE_CLIENT>"], "task_uuid": "4c9a9de5-22fc-4e81-bf02-c59231913282", "action_status": "succeeded"}
[root@mslave1 ~]# 
[root@mslave1 ~]# systemctl -l status flocker-docker-plugin
● flocker-docker-plugin.service - Flocker Docker Plugin
   Loaded: loaded (/usr/lib/systemd/system/flocker-docker-plugin.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2016-08-22 15:00:59 EDT; 15min ago
 Main PID: 26458 (flocker-docker-)
   Memory: 66.6M
   CGroup: /system.slice/flocker-docker-plugin.service
           └─26458 /opt/flocker/bin/python /usr/sbin/flocker-docker-plugin --journald

Aug 22 15:16:17 mslave1 flocker-docker-plugin[26458]: {"request_body": null, "url": "https://192.168.120.156:4523/v1/state/nodes/by_era/de9ace0f-2be8-4c09-ba41-fa20d61fa5f6", "timestamp": 1471893377.86543, "action_status": "started", "task_uuid": "39d0d1af-e70d-4b9a-9701-fd69b0058c6c", "action_type": "flocker:apiclient:http_request", "method": "GET", "task_level": [2503, 1]}
Aug 22 15:16:17 mslave1 flocker-docker-plugin[26458]: {"task_uuid": "39d0d1af-e70d-4b9a-9701-fd69b0058c6c", "error": false, "timestamp": 1471893377.878294, "message": "Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x53799e0>", "message_type": "twisted:log", "task_level": [2503, 3]}
Aug 22 15:16:18 mslave1 flocker-docker-plugin[26458]: {"exception": "flocker.apiclient._client.NotFound", "task_level": [2503, 4], "action_type": "flocker:apiclient:http_request", "reason": "{\"description\": \"No node found with given era.\"}", "timestamp": 1471893378.083315, "task_uuid": "39d0d1af-e70d-4b9a-9701-fd69b0058c6c", "action_status": "failed"}
Aug 22 15:16:18 mslave1 flocker-docker-plugin[26458]: {"task_uuid": "7cc8bfcb-44af-4662-84a9-342340e40a44", "error": false, "timestamp": 1471893378.085253, "message": "Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x53799e0>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:18 mslave1 flocker-docker-plugin[26458]: {"request_body": null, "url": "https://192.168.120.156:4523/v1/state/nodes/by_era/de9ace0f-2be8-4c09-ba41-fa20d61fa5f6", "timestamp": 1471893378.186386, "action_status": "started", "task_uuid": "39d0d1af-e70d-4b9a-9701-fd69b0058c6c", "action_type": "flocker:apiclient:http_request", "method": "GET", "task_level": [2504, 1]}
Aug 22 15:16:18 mslave1 flocker-docker-plugin[26458]: {"task_uuid": "39d0d1af-e70d-4b9a-9701-fd69b0058c6c", "error": false, "timestamp": 1471893378.202367, "message": "Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x5378878>", "message_type": "twisted:log", "task_level": [2504, 3]}
Aug 22 15:16:18 mslave1 flocker-docker-plugin[26458]: {"exception": "flocker.apiclient._client.NotFound", "task_level": [2504, 4], "action_type": "flocker:apiclient:http_request", "reason": "{\"description\": \"No node found with given era.\"}", "timestamp": 1471893378.513994, "task_uuid": "39d0d1af-e70d-4b9a-9701-fd69b0058c6c", "action_status": "failed"}
Aug 22 15:16:18 mslave1 flocker-docker-plugin[26458]: {"task_uuid": "9d7c57a4-acc7-4a9d-abf6-3e8ab840a73e", "error": false, "timestamp": 1471893378.516585, "message": "Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x5378878>", "message_type": "twisted:log", "task_level": [1]}
Aug 22 15:16:18 mslave1 flocker-docker-plugin[26458]: {"request_body": null, "url": "https://192.168.120.156:4523/v1/state/nodes/by_era/de9ace0f-2be8-4c09-ba41-fa20d61fa5f6", "timestamp": 1471893378.615513, "action_status": "started", "task_uuid": "39d0d1af-e70d-4b9a-9701-fd69b0058c6c", "action_type": "flocker:apiclient:http_request", "method": "GET", "task_level": [2505, 1]}
Aug 22 15:16:18 mslave1 flocker-docker-plugin[26458]: {"task_uuid": "39d0d1af-e70d-4b9a-9701-fd69b0058c6c", "error": false, "timestamp": 1471893378.630004, "message": "Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x5378638>", "message_type": "twisted:log", "task_level": [2505, 3]}
[root@mslave1 ~]# 

Get list of running nodes:

[root@mmaster1 new.cert]# flockerctl --control-service=192.168.120.156 list-nodes
SERVER   ADDRESS 

[root@mmaster1 new.cert]# 

Ta-da....no nodes... :(