gluster / storhaug

High Availability (HA) setup utility for NFS-Ganesha
GNU General Public License v2.0
12 stars 9 forks source link

[question] Expected downtime during failover #34

Open epol opened 6 years ago

epol commented 6 years ago

Is there any expected downtime during failover?

If a client is mounting NFS from a public IP address and the machine hosting that address suddenly becomes unavailable then that IP is moved to another available server, after that the new clients are able to mount NFS from the same IP, but old clients (that already mounted the NFS) resume working only after some kind of timeout. In our experience that timeout is about 30 seconds for NFSv3 and about 90 seconds for NFSv4.

Are this values expected or there may be any problem in our configuration? Is there any way to lower this timeouts?

ryno83 commented 6 years ago

For nfs-ganesha, you can adjust this with grace_period. You can find configuration exemple here : https://github.com/nfs-ganesha/nfs-ganesha/tree/V2.6-stable/src/config_samples. Look at config.txt for the complete list of settings. For NFSv4 there also Graceless setting but I'm don't know if it's safe to enable it.

You should also look at gluster volume setting network.ping-timeout (default value 42 sec). But I think it should not be lowered. Do some research before you change this setting.

epol commented 6 years ago

So it's a NFS configuration and storhaug (using the /var/lib/nfs syncronization) can't prevent it. It's good to hear because it means there isn't anything wrong with our setup.

Regarding the Grace_Period option, i see from the config.txt file that that's a parameter of NFSv4, will it be applied also to NFSv3? (sorry if this is more a nfs-ganesha question).

ryno83 commented 6 years ago

grace_period looks like a NFSv4 specific setting. My first guess is No, it will not applied to NFSv3. Maybe @kalebskeithley could confirm ?

kalebskeithley commented 6 years ago

there is no NFS_GRACE in the NFSv3 protocol.

epol commented 6 years ago

OK, thank you for the information.

MarvinTO commented 5 years ago

Hey @epol , were you able to fix the downtime during failover?