gluster / glusterfs

Gluster Filesystem : Build your distributed storage in minutes
https://www.gluster.org
GNU General Public License v2.0
4.51k stars 1.07k forks source link

prevent gnfs IO Errors on smaller files #4322

Closed erikja closed 1 month ago

erikja commented 1 month ago

In certain situations, smaller files will report I/O errors when accessed from NFS using Gluster NFS. With our settings, files up to 170M could report this in some cases. It was not a consistent failure.

Disbling the NFS performance I/O cache seemed to work around the instances of the problem observed for non-sharded volumes.

Research showed that gluster NFS is relying on an errno return value of EINVAL to detect EOF and set is_eof. However, in some paths this value was not retained or was reset to zero.

This change passes the errno along so it can be used by gluster NFS. We found the issue in the shard xlator and the io-cache xlator.

gluster-ant commented 1 month ago

Can one of the admins verify this patch?

gluster-ant commented 1 month ago

Can one of the admins verify this patch?

gluster-ant commented 1 month ago

Can one of the admins verify this patch?

erikja commented 1 month ago

I apologize for making a new PR it shows my inexperience with updating a PR with new changes on github.com. But it seemed easier to just use this new one than try to fix what I did. The old one was PR 4319, which I closed.

I have integrated feedback from 4319 into this version.

I will send to the community and attach here my test case.

erikja commented 1 month ago

notes.txt

I attach here my notes on how to duplicate the problem, isolated from a production environment.... and using fewer configuration tweaks.

aravindavk commented 1 month ago

/run regression

gluster-ant commented 1 month ago

0 test(s) failed

1 test(s) generated core ./tests/bugs/shard/bug-1605056.t

1 test(s) needed retry ./tests/bugs/shard/bug-1605056.t https://build.gluster.org/job/gh_centos7-regression/3384/

erikja commented 1 month ago

I just wanted to clarify is the test fault likely meaning I missed something (I am certainly not an expert in this stuff!)? or is this a test that sometimes fails anyway?

mohit84 commented 1 month ago

ssed something (I am certainly not an expert in this stuff!)? or is this a test that sometimes fails anyway? I don;t think test is crashing because of your patch.

mohit84 commented 1 month ago

/run regression

gluster-ant commented 1 month ago

1 test(s) failed ./tests/bugs/quota/bug-1035576.t

0 test(s) generated core

2 test(s) needed retry ./tests/00-geo-rep/georep-basic-dr-tarssh.t ./tests/bugs/quota/bug-1035576.t https://build.gluster.org/job/gh_centos7-regression/3385/

erikja commented 1 month ago

I agree I need to check on why I included xlator.h. It was a good question; I think a first attempt at a fix required that but it likely isn't required any longer with what I came up with. So let me get rid of that and re-do the test case. Sorry about that. I will report back either way.

erikja commented 1 month ago

Things seem good without that xlator.h. I am going to try to update this PR with an updated patch. May take me a bit; it's a work flow I'm not familiar with.

aravindavk commented 1 month ago

Things seem good without that xlator.h. I am going to try to update this PR with an updated patch. May take me a bit; it's a work flow I'm not familiar with.

It is very easy.

erikja commented 1 month ago

Hello friends. Is there anything else I should do here? I have integrated the feedback so far I think and I thank you for that.

mohit84 commented 1 month ago

/run regression

mohit84 commented 1 month ago

@erikja Can you please backport the same patch into the release-10/11 also?

erikja commented 1 month ago

@erikja Can you please backport the same patch into the release-10/11 also?

I am very ignorant in this stuff. Would it be possible to just share which branches I should push to and then I'll push it? If there is a conflict I'll work the issue in the given branch.

And sure I'm happy to.

erikja commented 1 month ago

@erikja Can you please backport the same patch into the release-10/11 also?

I am very ignorant in this stuff. Would it be possible to just share which branches I should push to and then I'll push it? If there is a conflict I'll work the issue in the given branch.

And sure I'm happy to.

I see those as actual branch names. Let me check it out, cherry pick, and give it a shot.

erikja commented 1 month ago

I believe that I have made the requested PR for the release branches and they appear above.