Open gufranmmu opened 8 years ago
@gufranmmu can you run flocker-diagnostics
and send them to support@clusterhq.com and CC ryan.wallner@clusterhq.com?
Thanks
Done! waiting for your reply now :)
Hi, what i am actually seeing is that flocker is trying to use a volume at sdf
(or vda/xvda) and cannot attach. This is related to https://github.com/ClusterHQ/flocker/issues/2772
flocker:node:agents:blockdevice:aws@1/started
|-- count: 15303
|-- operation: [u'_attach_ebs_volume', [u'vol-4171e8c8', u'i-0cbab78c', u'/dev/sdf'], {}]
`-- timestamp: 2016-06-06 14:28:33.654007
+-- flocker:node:agents:blockdevice:aws:boto_logs@2
|-- message: Calling ec2:attach_volume with {'InstanceId': u'i-0cbab78c', 'Device': u'/dev/sdf', u'VolumeId': u'vol-4171e8c8'}
`-- timestamp: 2016-06-06 14:28:33.656907
+-- flocker:node:agents:blockdevice:aws@3/failed
|-- aws_code: InvalidParameterValue
|-- aws_message: Already have a device attached to: /dev/sdf
|-- aws_request_id: 4392e7a2-d47a-4776-9385-ccb660a2002f
|-- exception: botocore.exceptions.ClientError
|-- reason: An error occurred (InvalidParameterValue) when calling the AttachVolume operation: Already have a device attached to: /dev/sdf
`-- timestamp: 2016-06-06 14:28:33.687193
b38b8eb0-98f7-4162-8f23-51ce840554a7
+-- eliot:traceback@1
|-- exception: flocker.node.agents.ebs.AttachFailed
|-- reason: (u'vol-4171e8c8', u'i-0cbab78c', u'/dev/sdf')
|-- timestamp: 2016-06-06 14:28:33.687890
`-- traceback: Traceback: <class 'flocker.node.agents.ebs.AttachFailed'>: (u'vol-4171e8c8', u'i-0cbab78c', u'/dev/sdf')
/usr/lib64/python2.7/threading.py:811:__bootstrap_inner
/usr/lib64/python2.7/threading.py:764:run
/opt/flocker/lib/python2.7/site-packages/twisted/_threads/_threadworker.py:46:work
/opt/flocker/lib/python2.7/site-packages/twisted/_threads/_team.py:190:doWork
--- <exception caught here> ---
/opt/flocker/lib/python2.7/site-packages/twisted/python/threadpool.py:246:inContext
/opt/flocker/lib/python2.7/site-packages/twisted/python/threadpool.py:262:<lambda>
/opt/flocker/lib/python2.7/site-packages/twisted/python/context.py:118:callWithContext
/opt/flocker/lib/python2.7/site-packages/twisted/python/context.py:81:callWithContext
/opt/flocker/lib/python2.7/site-packages/flocker/node/agents/ebs.py:1296:attach_volume
This attachment error happening on the dataset agent causes the flocker docker plugin to timeout waiting for the Path.
21bd1d9b-00d0-4f89-91bc-1de4e2654c3c
+-- api:request@1/started
|-- method: POST
|-- request_path: /VolumeDriver.Path
`-- timestamp: 2016-06-06 11:32:14.043872
+-- flocker:apiclient:http_request@3,1/started
|-- method: GET
|-- request_body: None
|-- timestamp: 2016-06-06 11:32:14.044293
`-- url: https://mesos-master-1:4523/v1/configuration/datasets
+-- twisted:log@3,3
|-- error: False
|-- message: Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x5e33e18>
`-- timestamp: 2016-06-06 11:32:14.047729
+-- flocker:apiclient:http_request@3,4/failed
|-- exception: twisted.internet.error.TimeoutError
|-- reason: User timeout caused connection failure.
`-- timestamp: 2016-06-06 11:32:44.048563
+-- api:request@4/succeeded
`-- timestamp: 2016-06-06 11:32:44.049087
22c7e3cd-012f-498a-b9d6-138339a6066b
+-- eliot:traceback@1
|-- exception: twisted.internet.error.TimeoutError
|-- reason: User timeout caused connection failure.
|-- timestamp: 2016-06-06 11:32:44.048840
`-- traceback: Traceback (failure with no frames): <class 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.
so, if the root cause of the error which is the pre-existing sdf block device is fixed, you should be all set.
@wallnerryan Thank you very much for investigating the issue! I can not see the bug in the clusterhq Jira board. Has it been created?
Could you please also tell what is the solution for this and when can we expect this to be included in flocker release? There is no /sdf disk in my instances.
@gufranmmu one solution is the manually attach EBS volumes to SDF so flocker does not choose sdf as a valid option. The other is to contact AWS and ask why sdf is hidden. Even though there is no sdf
in your host AWS reserves to use it on some nodes and hides it form its users, see https://github.com/ClusterHQ/flocker/issues/2772. Using a different instance type sometimes resolves this. We may look to fix this by not allowing flocker to choose sdf
. @wallrj may be able to comment on a suggestion.
@wallnerryan the issue FLOC-4244 does not seem to be available in flocker issue tracker anymore?
it is, just private now.
When i execute
docker run -ti -v test-vol:/data --volume-driver flocker busybox sh
I can see from cloud GUI that a new flocker volume as been created and attached to the correct instance, but docker commands ends with the following error on the first run:Error response from daemon: Cannot start container 255544d3cf494a33038713d26 ef55d5ffc45ddfd38fe8f66e72bcf76a4ae4a23: ConnectingCancelledError: IPv4Addre ss(TCP, 'mesos-master-1', 4523)
and on second try i see this error: docker run -ti -v test-vol:/data --volume-driver flocker busybox sh
Error response from daemon: Cannot start container 795a0aa3734d9d566593a923660d8c1334b7b6be0cdab70aacb6ad4d5a03aa0c: TimeoutError: User timeout caused connection failure.
Here are logs related to mounting: