docker-archive / for-azure

27 stars 18 forks source link

cloudstor share breaks after a week on one node in swarm #58

Closed djeeg closed 6 years ago

djeeg commented 6 years ago

Expected behavior

access to cloudstor volumes on all nodes

Actual behavior

VolumeDriver.Create: error creating azure file share: storage: service returned error: StatusCode=404, ErrorCode=404 The specified share does not exist., ErrorMessage=no response body was available for error status code
Running modprobe nf_nat failed with message: `modprobe: module nf_nat not found in modules.dep
Running modprobe xt_conntrack failed with message: `modprobe: module xt_conntrack not found in modules.dep
Running modprobe nf_nat failed with message: `modprobe: module nf_nat not found in modules.dep
Failed to deserialize netlink ndmsg: Link not found
Failed to receive from netlink: no buffer space available
could not fetch metadata: cannot read metadata: open /mnt/cloudstor/cloudstor-metadata/GUID: no such file or directory
request accepted\" name=\"core_volsharedlog\" operation=get
error while unmounting volume core_volsharedlog: VolumeDriver.Unmount: unmount failed: exit status 1\noutput=\"umount: can't unmount /mnt/cloudstor/core_volsharedlog: Resource busy
request accepted\" name=\"core_volsharedlog\" operation=mount "
mount cmd:&{/bin/mount [mount -t cifs //.file.core.windows.net/volsharedlog /mnt/cloudstor/core_volsharedlog -o vers=2.1,username=,password=,file_mode=0777,dir_mode=0777,uid=0,gid=0,mfsymlinks] []  <nil> <nil> <nil> [] %!s(*syscall.SysProcAttr=<nil>) %!s(*os.Process=<nil>) <nil> <nil> <nil> %!s(bool=false) [] [] [] [] %!s(chan error=<nil>) %!s(chan struct {}=<nil>)}\" operation=mount
mount output=/dev/sdb1 on / type ext4 (rw,relatime,data=ordered)
/dev/sdb1 on /mnt type ext4 (rw,relatime,data=ordered)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev type tmpfs (rw,nosuid,size=65536k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666)

Seems to have also crashed other cloudstor volumes on MANAGER0 too

level=error msg="//.file.core.windows.net/db0d5a45b5d on /mnt/cloudstor/monitor_volawstats_1 type cifs (rw,relatime,vers=2.1,sec=ntlmssp,cache=strict,username=,domain=X,uid=0,forceuid,gid=0,forcegid,addr=52.XXX.XXX.XXX,file_mode=0777,dir_mode=0777,nounix,serverino,mapposix,mfsymlinks,rsize=1048576,wsize=1048576,echo_interval=60,actimeo=1

Information

Volume deployed to all nodes with this stack

version: '3.4'
services:
  volumecontrol:
    image: busybox
    command: sleep 2073600
    volumes:
      - volsharedlog:/logs
    deploy:
      mode: global
volumes:
  volsharedlog:
    name: '{{index .Service.Labels "com.docker.stack.namespace"}}_volsharedlog'
    driver: cloudstor:azure

Not really sure why I see unmount commands for this volume, because this stack/volume was not undeployed

VolumeDriver.Unmount: unmount failed:

I have been (running other containers/deploying other stacks) that reference volsharedlog as an "external" volume.

djeeg commented 6 years ago

Would there be file count/size limits?

image

FrenchBen commented 6 years ago

One of the error that seems to be floating around is this one: Failed to receive from netlink: no buffer space available The limits on Azure Storage should be pretty high: https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits#storage-limits

Could it be that your service labels changed, which caused the volume not to be found?

djeeg commented 6 years ago

Seams more stable after starting with a fresh 18.03 swarm. Closing.