docker-archive / for-aws

92 stars 26 forks source link

Cloudstor volume plugin does not support NVMe block devices on Nitro-based instances #184

Open kinghuang opened 5 years ago

kinghuang commented 5 years ago

Summary

On current generation EC2 instances, EBS volumes are exposed as NVMe block devices. Devices are named /dev/nvme0n1, /dev/nvme1n1, ….

The Cloudstor volume plugin doesn't appear to work correct with EBS volumes exposed as NVMe block devices.

Expected behaviour

Cloudstor should be able to handle EBS volumes exposed as NVMe block devices.

Actual behaviour

An error occurs mounting Docker volumes backed by EBS volumes exposed as NVMe block devices.

Information

OK hostname=ip-172-31-20-204-us-west-2-compute-internal session=1542824424-wnwHekROgUvnZcZ4yFEjSf6gCYX7tSbq
Done requesting diagnostics.
Your diagnostics session ID is 1542824424-wnwHekROgUvnZcZ4yFEjSf6gCYX7tSbq
Please provide this session ID to the maintainer debugging your issue.

Cloudstor correctly creates and attaches EBS volumes. But, cannot then mount volumes in containers.

Steps to reproduce the behavior

  1. Create a Docker for AWS 18.06.1 cluster using CloudFormation. Use current generation instance types such as m5.large.
  2. Make sure the cloudstor:aws plugin is installed and active.
docker plugin inspect --format '{{ .Enabled }}' cloudstor:aws
  1. Create a relocatable volume backed by EBS.
docker volume create \
--driver cloudstor:aws \
--opt backing=relocatable \
--opt size=1 \
ebs-volume
  1. Run a container, mounting the volume created in the previous step. An error will occur mounting the volume.
docker container run -it --rm \
-v ebs-volume:/ebs \
alpine:3.8 ash

Unable to find image 'alpine:3.8' locally
3.8: Pulling from library/alpine
4fe2ade4980c: Pull complete 
Digest: sha256:621c2f39f8133acb8e64023a94dbdf0d5ca81896102b9e57c0dc184cadaf5528
Status: Downloaded newer image for alpine:3.8
docker: Error response from daemon: error while mounting volume '': Post http://%2Frun%2Fdocker%2Fplugins%2F642ecd8348cf8fd93b18fad4fbd2be66ce9720bedaeac70f26d13181f38a35e3%2Fcloudstor.sock/VolumeDriver.Mount: context deadline exceeded.
FrenchBen commented 5 years ago

/cc @ddebroy

dodgemich commented 5 years ago

This should follow the work done in RexRay - any ideas when this might get implemented? Without this change, it holds up use of the new instance types, which puts this plugin on a dead-end path in terms of usage...

stevehumer commented 5 years ago

@kinghuang appears to have documented this very well, and we are experiencing the same issue here for the past month. Would like to get on the latest nitro-based instances (C5/M5), as it's not sustainable to stay on 4-series instances much further into 2019.

This hurts us on performance, which we've proven out on a few clusters that do not have storage requirements, and has caused us to delay reserving 5-series instance types for the year ahead for much of our workload. Hopeful to get some traction here.

dodgemich commented 5 years ago

Bump to top...any news on this front?

akumadare commented 5 years ago

Hi, Experienced the very same issue attempting to move a workload onto the new r5 instance family yesterday. Using the previous generation is a workaround for now but an update on this would help to decide whether to look for alternatives.

kinghuang commented 5 years ago

@joeabbey Any chance we can get a comment from Docker on this? Will Cloudstor be updated to handle NVMe block devices on current generation EC2 instances?

iget-master commented 5 years ago

Just reserved a few M5 instances, but noticed this issue. Any workarounds for this bug?

akomlik commented 5 years ago

We had to downgrade our m5 to m4 and t3 to t2 to make this work ;-(

kinghuang commented 5 years ago

I've moved to REX-Ray EBS, but it doesn't handle copying volumes across zones like Cloudstor.

dodgemich commented 5 years ago

Same - have to choose between using older instances (m4/t2) and getting cross-AZ replication with Cloudstor, or using newer instances (m5/t3) and losing cross-AZ replication with RexRay.

Would be good to hear if Docker is planning to support Cloudstor here, otherwise it's on a deadend path...

iget-master commented 5 years ago

I'll migrate to REX-Ray instead, since we can't downgraate to M4 since just reserved a few M5 for 3 years. :-1: Fortunately, the lack of across zones volume copy doesn't affect us.

porshkevich commented 5 years ago

Hi, I found a temporary solution: https://github.com/oogali/ebs-automatic-nvme-mapping

dodwmd commented 4 years ago

Is cloudstor no longer being developed?

dodgemich commented 4 years ago

We gave up, migrated to Rexray and accepted the lack of multi-az support.

Very unfortunate that Docker-Inc didn't at least OSS the plugin if not carrying it forward, as it had some nice features.

respectTheCode commented 4 years ago

We gave up and moved to Rexray as well. It has been a much better experience even though it took longer to get up and running.

enbohm commented 4 years ago

@dodwmd @dodgemich @respectTheCode too bad that no one from Docker can assist - I've tried as well getting in touch with @joeabbey et.al to assist updating the Docker version but no reply what so ever. Feels not very professional from Docker's side IMO.

scottbuckel commented 4 years ago

@enbohm did you ever get in contact with anyone? We've looked into using Rexray but it also looks like it's not actively maintained anymore..

We just upgraded from t2 to t3's and prepaid for the t3's for the next year but now ran into this issue and I'm running out of options..

scottbuckel commented 4 years ago

@porshkevich Care to explain how you implemented this workaround? Thanks!

Hi, I found a temporary solution: https://github.com/oogali/ebs-automatic-nvme-mapping