danilop / yas3fs

YAS3FS (Yet Another S3-backed File System) is a Filesystem in Userspace (FUSE) interface to Amazon S3. It was inspired by s3fs but rewritten from scratch to implement a distributed cache synchronized by Amazon SNS notifications. A web console is provided to easily monitor the nodes of a cluster.
http://danilop.github.io/yas3fs
MIT License
643 stars 98 forks source link

EFAULT bad address in system call argument #174

Open dgagnon opened 6 years ago

dgagnon commented 6 years ago

We are having some issue with long running yas3fs mounts. Our node app sometime cannot read a file and the error is :

Error: EFAULT: bad address in system call argument, open 'filename'

It also happened today with grep:

grep: /var/app/current/resources/user/00/26/00265f862009e0b380630fe697dfb443.1516131586.jpg: Bad address

Here is the log entries around for when I ran the recursive grep:

/var/log/yas3fs-user.log-Dummy-29 2018-06-01T02:39:33.958 DEBUG getattr -> '/00/26' 'None' /var/log/yas3fs-user.log-Dummy-20 2018-06-01T02:39:33.958 DEBUG release '/00/25/0025445e1b9017eabd162a121a6c9ed8.1519741804.jpg' '0' 'None' /var/log/yas3fs-user.log-Dummy-29 2018-06-01T02:39:33.958 DEBUG get_metadata -> '/00/26' 'attr' 'None' /var/log/yas3fs-user.log-Dummy-29 2018-06-01T02:39:33.959 DEBUG get_metadata <- '/00/26' 'attr' 'None' '{'st_ctime': 1527619399.0, 'st_mtime': 1527619399.0, 'st_nlink': 1, 'st_mode': 16877, 'st_size': 4096, 'st_gid': 0, 'st_uid': 0, 'st_atime': 1527619399.0}' /var/log/yas3fs-user.log-Dummy-29 2018-06-01T02:39:33.959 DEBUG getattr <- '/00/26' 'None' '{'st_ctime': 1527619399.0, 'st_mtime': 1527619399.0, 'st_nlink': 1, 'st_mode': 16877, 'st_size': 4096, 'st_gid': 0, 'st_uid': 0, 'st_atime': 1527619399.0}' /var/log/yas3fs-user.log-Dummy-22 2018-06-01T02:39:33.959 DEBUG readdir '/00/26' '0' /var/log/yas3fs-user.log-Dummy-22 2018-06-01T02:39:33.959 DEBUG readdir '/00/26' '0' no cache /var/log/yas3fs-user.log-Dummy-22 2018-06-01T02:39:33.959 DEBUG readdir '/00/26' '0' S3 list '00/26/' /var/log/yas3fs-user.log:Dummy-22 2018-06-01T02:39:33.973 DEBUG readdir '/00/26' '0' S3 list key '00/26/00265f862009e0b380630fe697dfb443.1516131586.jpg' /var/log/yas3fs-user.log:Dummy-22 2018-06-01T02:39:33.974 DEBUG readdir '/00/26' '0' '[u'.', u'..', u'00265f862009e0b380630fe697dfb443.1516131586.jpg']' /var/log/yas3fs-user.log:Dummy-11 2018-06-01T02:39:33.974 DEBUG getattr -> '/00/26/00265f862009e0b380630fe697dfb443.1516131586.jpg' 'None' /var/log/yas3fs-user.log:Dummy-11 2018-06-01T02:39:33.974 DEBUG get_metadata -> '/00/26/00265f862009e0b380630fe697dfb443.1516131586.jpg' 'attr' 'None' /var/log/yas3fs-user.log:Dummy-11 2018-06-01T02:39:33.974 DEBUG get_metadata <- '/00/26/00265f862009e0b380630fe697dfb443.1516131586.jpg' 'attr' 'None' '{'st_ctime': 1519509762.0, 'st_mtime': 1519509762.0, 'st_nlink': 1, 'st_mode': 33261, 'st_size': 22564, 'st_gid': 0, 'st_uid': 0, 'st_atime': 1519509762.0}' /var/log/yas3fs-user.log:Dummy-11 2018-06-01T02:39:33.974 DEBUG getattr <- '/00/26/00265f862009e0b380630fe697dfb443.1516131586.jpg' 'None' '{'st_ctime': 1519509762.0, 'st_mtime': 1519509762.0, 'st_nlink': 1, 'st_mode': 33261, 'st_size': 22564, 'st_gid': 0, 'st_uid': 0, 'st_atime': 1519509762.0}' /var/log/yas3fs-user.log:Dummy-12 2018-06-01T02:39:33.974 DEBUG open '/00/26/00265f862009e0b380630fe697dfb443.1516131586.jpg' '32768' /var/log/yas3fs-user.log:Dummy-12 2018-06-01T02:39:33.975 DEBUG check_data '/00/26/00265f862009e0b380630fe697dfb443.1516131586.jpg' /var/log/yas3fs-user.log:Dummy-12 2018-06-01T02:39:33.975 DEBUG check_data '/00/26/00265f862009e0b380630fe697dfb443.1516131586.jpg' data already in place /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:33.975 DEBUG getattr -> '/00/27' 'None' /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:33.975 DEBUG get_metadata -> '/00/27' 'attr' 'None' /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:33.975 DEBUG get_key /00/27 /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:33.975 DEBUG get_key from S3 #1 '/00/27' /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:33.981 DEBUG get_key from S3 #2 '/00/27' '/00/27/' /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:33.989 DEBUG get_key no '/00/27' /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:33.989 DEBUG folder_has_contents '/00/27' 1 /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:33.989 DEBUG has_element '<boto.s3.bucketlistresultset.BucketListResultSet object at 0x7fc1e7b67d50>' 1 /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:34.001 DEBUG has_element '<boto.s3.bucketlistresultset.BucketListResultSet object at 0x7fc1e7b67d50>' -> '00/27/0027ae6a72c14a88e490b642c73e53df.1459989254.jpg' /var/log/yas3fs-user.log-Dummy-15 2018-06-01T02:39:34.001 DEBUG has_element '<boto.s3.bucketlistresultset.BucketListResultSet object at 0x7fc1e7b67d50>' OK

This happens with mostly default options:

/usr/local/bin/yas3fs s3://resources-user-${ENV}/ $LIVE_DIR/resources/user --topic $(/opt/elasticbeanstalk/bin/get-config environment -k sns-resources-user) --queue resources-user-${ENV} --region ca-central-1 --aws-managed-encryption --nonempty -d -l /var/log/yas3fs-user.log --log-mb-size 1024

As well as heavily tuned:

/usr/local/bin/yas3fs s3://resources-user-${ENV}/ $LIVE_DIR/resources/user --topic $(/opt/elasticbeanstalk/bin/get-config environment -k sns-resources-user) --queue resources-user-${ENV} --region ca-central-1 --aws-managed-encryption --prefetch --nonempty --cache-disk-size 20480 --recheck-s3 --prefetch-num 4 --s3-num 64 --download-num 8 --mp-size 5 --mp-num 8

The problem seems to happen more often on files that are often being read and happens more often over time. umounting/remounting fixes it.

Linux ip 4.9.85-38.58.amzn1.x86_64 #1 SMP Wed Mar 14 01:17:26 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Python 2.7 yas3fs 2.3.5 fuse-2.9.4-1.17.amzn1.x86_64 fuse-libs-2.9.4-1.17.amzn1.x86_64 python27-boto-2.48.0-1.2.amzn1.noarch python27-botocore-1.8.13-1.66.amzn1.noarch

Any input would be appreciated.

jazzl0ver commented 6 years ago

I have the same issue. Please, help! CentOS 6 running in AWS Linux use1-qa-2 2.6.32-696.23.1.el6.x86_64 #1 SMP Tue Mar 13 22:44:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Python 2.7.13 yas3fs 2.4.2 (https://github.com/jazzl0ver/yas3fs; pull requests are awaiting for your confirmation) Fuse 2.9.3

/opt/rh/python27/root/usr/bin/python /opt/rh/python27/root/usr/bin/yas3fs --download-retries-num 10 --recheck-s3 -d --log /tmp/.yas3fs-s3volqa --log-mb-size 10 --log-backup-count 10 --log-backup-gzip s3://s3volqa /mnt/s3vol --topic arn:aws:sns:us-east-1:0000000:s3volqa --new-queue-with-hostname
jazzl0ver commented 6 years ago

Here is the strace output:

16237 open("/tmp/yas3fs-s3vol-GDz3nW/files/etc-sudoers.d/secure_path", O_RDWR) = -1 ENOENT (No such file or directory)
16237 write(2, "Traceback (most recent call last):\n", 35) = 35
16237 write(2, "  File \"build/bdist.linux-x86_64/egg/yas3fs/fuse.py\", line 425, in _wrapper\n", 76) = 76
16237 write(2, "    return func(*args, **kwargs) or 0\n", 38) = 38
16237 write(2, "  File \"build/bdist.linux-x86_64/egg/yas3fs/fuse.py\", line 496, in open\n", 72) = 72
16237 write(2, "    fi.flags)\n", 14)   = 14
16237 write(2, "  File \"build/bdist.linux-x86_64/egg/yas3fs/fuse.py\", line 898, in __call__\n", 76) = 76
16237 write(2, "    ret = getattr(self, op)(path, *args)\n", 41) = 41
16237 write(2, "  File \"build/bdist.linux-x86_64/egg/yas3fs/__init__.py\", line 2547, in open\n", 77) = 77
16237 write(2, "    self.cache.get(path, 'data').open()\n", 40) = 40
16237 write(2, "  File \"build/bdist.linux-x86_64/egg/yas3fs/__init__.py\", line 259, in open\n", 76) = 76
16237 write(2, "    self.content = open(filename, mode='rb+')\n", 46) = 46
16237 write(2, "IOError: [Errno 2] No such file or directory: u'/tmp/yas3fs-s3vol-GDz3nW/files/etc-sudoers.d/secure_path'\n", 118) = 118

This is the part of the code which fails (__init__.py):

    def open(self):
        with self.get_lock():
            if not self.has('open'):
                if self.store == 'disk':
                    filename = self.cache.get_cache_filename(self.path)
                    self.content = open(filename, mode='rb+')
            self.inc('open')

Any ideas how to fix that?

jazzl0ver commented 6 years ago

@dgagnon I've made a quick workaround. please, test it if you can

dgagnon commented 6 years ago

@jazzl0ver I am not convinced we have the exact same issue. The error code is different. I get EFAULT (bad address) and not ENOENT (No such file or directory). I really appreciate you looking into this though :p

I am reasonably certain that the issue has to do with fuse-libs more than yas3fs, but I have been unable to pin point the root cause yet. Since I am unable to replicate the issue reliably, I have been unable to get an strace yet, but I will keep on trying.