gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.69k stars 1.08k forks source link

[bug:1548517] write failed with EINVAL due O_DIRECT write buffer with unaligned size #946

Closed gluster-ant closed 3 years ago

gluster-ant commented 4 years ago

URL: https://bugzilla.redhat.com/1548517 Creator: lav at etersoft.ru Time: 20180223T17:56:54

Created attachment 1399948 test for aligned and unligned write with O_DIRECT

Description of problem:

I catched in a brick log billions errors about Invalid argument during write:

018-02-23 14:57:37.624075] E [MSGID: 113072] [posix.c:3631:posix_writev] 0-ftp-pub-posix: write failed: offset 131072, [Invalid argument] [2018-02-23 14:57:37.624260] E [MSGID: 115067] [server-rpc-fops.c:1407:server_writev_cbk] 0-ftp-pub-server: 18548605: WRITEV 2 (cda02ff8-011e-4ecc-9e22-86741aa9fee5), client: multi.office.etersoft.ru-31148-2018/02/22-14:44:24:479443-ftp-pub-client-2-0-0, error-xlator: ftp-pub-posix [Invalid argument]

In strace -y -f -p on glusterfsd process it seems like [pid 31198] pwrite64(28</var/local/eterglust/pub/.glusterfs/c1/a6/c1a6f57f-2082-466a-8f25-5430e281da58>, "libgl1-mesa-glx\nlibwine-vanilla\n", 32, 0) = -1 EINVAL (Invalid argument)

The line in xlators/storage/posix/src/posix.c where we got error has the comment:

/ not sure whether writev works on O_DIRECT'd fd / retval = sys_pwrite (fd, buf, vector[idx].iov_len, internal_off);

I wrote a little program (is attached) and discovered I have the error with newest kernels (4.4.*) and no problems with 2.6.32 kernel.

As I see we need for buffer address and for buffer size both use aligned (512) values only.

On both 32 and 64 bit system glusterfs 3.12.5 kernel 2.6.32, 4.4.105

test result: UNALIGNED address write: FAILED ALIGNED address write: FAILED UNALIGNED address with aligned size write: FAILED ALIGNED address and size write: SUCCESSFUL

OpenVZ container result: UNALIGNED address write: SUCCESSFUL ALIGNED address write: SUCCESSFUL UNALIGNED address with aligned size write: SUCCESSFUL ALIGNED address and size write: SUCCESSFUL

gluster-ant commented 4 years ago

Time: 20181023T14:53:59 srangana at redhat commented: Release 3.12 has been EOLd and this bug was still found to be in the NEW state, hence moving the version to mainline, to triage the same and take appropriate actions.

gluster-ant commented 4 years ago

Time: 20190614T09:42:27 atumball at redhat commented: We have not noticed the problem in later kernels of Fedora29/30 etc. Needs to be tested again.

gluster-ant commented 4 years ago

Time: 20191119T15:04:19 olaf.buitelaar at gmail commented: i'm seeing a similar issue on gluster 6.6 with centos 7 (kernel 3.10.0-1062.4.3.el7.x86_64);

[2019-11-19 14:56:04.017381] E [MSGID: 113072] [posix-inode-fd-ops.c:1886:posix_writev] 0-ovirt-data-posix: write failed: offset 0, [Invalid argument] [2019-11-19 14:56:04.017462] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ovirt-data-server: 221969: WRITEV 0 (309c077f-8882-43f7-a95b-ca2c4d27d2b5), client: CTX_ID:b3c80b69-0651-4e87-96d1-ee767cb7e425-GRAPH_ID:10-PID:19184-HOST:lease-16.dc01.adsolutions-PC_NAME:ovirt-data-client-1-RECON_NO:-0, error-xlator: ovirt-data-posix [Invalid argument] [2019-11-19 14:56:12.430962] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ovirt-data-server: 219748: WRITEV 0 (921dfa09-b252-4087-9c7c-47eda2a6266d), client: CTX_ID:05f7b92c-8dd6-434b-b835-7254dae1d1bc-GRAPH_ID:4-PID:93937-HOST:lease-23.dc01.adsolutions-PC_NAME:ovirt-data-client-1-RECON_NO:-0, error-xlator: ovirt-data-posix [Invalid argument] [2019-11-19 14:56:27.345631] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ovirt-data-server: 203815: WRITEV 4 (981676ff-6dbe-4a4c-8478-6e4f991a04f4), client: CTX_ID:366e668d-91ba-4373-960e-82e56f1ed7af-GRAPH_ID:0-PID:22624-HOST:lease-08.dc01.adsolutions-PC_NAME:ovirt-data-client-1-RECON_NO:-0, error-xlator: ovirt-data-posix [Invalid argument] [2019-11-19 14:56:45.491788] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ovirt-data-server: 210249: WRITEV 2 (a27a81c0-de78-40ee-9855-a62b6be01ffe), client: CTX_ID:4472864a-0fec-4e2c-ad3f-b9684b0808f6-GRAPH_ID:0-PID:30323-HOST:lease-21.dc01.adsolutions-PC_NAME:ovirt-data-client-1-RECON_NO:-0, error-xlator: ovirt-data-posix [Invalid argument]

Also i notice the cpu usage when this error occurs is very high.

The volume is configured with O_DIRECT; Volume Name: ovirt-data Type: Distributed-Replicate Volume ID: 2775dc10-c197-446e-a73f-275853d38666 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: 10.201.0.5:/data5/gfs/bricks/brick1/ovirt-data Brick2: 10.201.0.1:/data5/gfs/bricks/brick1/ovirt-data Brick3: 10.201.0.9:/data0/gfs/bricks/bricka/ovirt-data (arbiter) Brick4: 10.201.0.7:/data5/gfs/bricks/brick1/ovirt-data Brick5: 10.201.0.9:/data5/gfs/bricks/brick1/ovirt-data Brick6: 10.201.0.11:/data0/gfs/bricks/bricka/ovirt-data (arbiter) Brick7: 10.201.0.6:/data5/gfs/bricks/brick1/ovirt-data Brick8: 10.201.0.8:/data5/gfs/bricks/brick1/ovirt-data Brick9: 10.201.0.12:/data0/gfs/bricks/bricka/ovirt-data (arbiter) Brick10: 10.201.0.12:/data5/gfs/bricks/brick1/ovirt-data Brick11: 10.201.0.11:/data5/gfs/bricks/brick1/ovirt-data Brick12: 10.201.0.10:/data0/gfs/bricks/bricka/ovirt-data (arbiter) Options Reconfigured: performance.strict-o-direct: on server.event-threads: 6 performance.cache-size: 384MB performance.write-behind-window-size: 512MB user.cifs: off features.shard: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular cluster.data-self-heal-algorithm: full cluster.server-quorum-type: server cluster.quorum-type: auto cluster.eager-lock: enable network.remote-dio: off performance.low-prio-threads: 32 performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off nfs.disable: on performance.readdir-ahead: on transport.address-family: inet storage.owner-uid: 36 storage.owner-gid: 36 server.outstanding-rpc-limit: 1024 cluster.choose-local: off cluster.brick-multiplex: on

stale[bot] commented 4 years ago

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] commented 3 years ago

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.