LINBIT / linstor-gateway

Manages Highly-Available iSCSI targets, NVMe-oF targets, and NFS exports via LINSTOR
GNU General Public License v3.0
28 stars 6 forks source link

Why is the iptables drop 2049 port rule automatically added when using linstor-gateway to export nfs services? #28

Open yanest opened 5 months ago

yanest commented 5 months ago

I found that my nfs export could not be mounted using virtual ip on the client, but the shared directory could indeed be discovered using showmount -e. I can mount nfs normally using the IP of the physical network card. I checked all the configurations until I accidentally checked iptables and found that a rule to drop port 2049 was automatically added. This rule prevented me from using it. After I deleted it, I found that it could be done immediately. Use, why do this?

root@lab-pve1:~# iptables -vnL Chain INPUT (policy ACCEPT 172K packets, 45M bytes) pkts bytes target prot opt in out source destination 0 0 DROP 6 -- * * 0.0.0.0/0 192.168.128.30 multiport dports 2049

root@lab-pve3:~# linstor-gateway nfs list +----------+-------------------+--------------------+--------------------------+---------------+ | Resource | Service IP | Service state | NFS export | LINSTOR state | +----------+-------------------+--------------------+--------------------------+---------------+ | nfs | 192.168.128.30/32 | Started (lab-pve1) | /srv/gateway-exports/nfs | OK | +----------+-------------------+--------------------+--------------------------+---------------+

chrboe commented 5 months ago

Just guessing, but are you using iptables version 1.8.9 by any chance? There is a bug in that iptables version that formats -L output incorrectly, which breaks the portblock resource agent so that it cannot remove the block rule.

See https://github.com/ClusterLabs/resource-agents/pull/1924 for more details.

Solution: upgrade resource-agents to a version containing that commit, or use an older version of iptables. (The bug was fixed in upstream iptables, but they did not yet publish a new release with the fix included)

Edit: just saw your example output above, which confirms you are affected by the iptables bug. (See the 6 in the prot column, which should normally be tcp instead)

yanest commented 5 months ago

Just guessing, but are you using iptables version 1.8.9 by any chance? There is a bug in that iptables version that formats -L output incorrectly, which breaks the portblock resource agent so that it cannot remove the block rule.

See ClusterLabs/resource-agents#1924 for more details.

Solution: upgrade resource-agents to a version containing that commit, or use an older version of iptables. (The bug was fixed in upstream iptables, but they did not yet publish a new release with the fix included)

Edit: just saw your example output above, which confirms you are affected by the iptables bug. (See the 6 in the prot column, which should normally be tcp instead)

I do use iptables1.8.9, because this is the default version of pve. It seems that there is only this version in the apt library, and resource-agents is also the latest version (1:4.12.0-2). I don't know how to fix this bug.

`root@lab-pve1:~# apt list iptables -a Listing... Done iptables/stable,now 1.8.9-2 amd64 [installed]

root@lab-pve1:~# apt list resource-agents -a Listing... Done resource-agents/stable,now 1:4.12.0-2 amd64 [installed]`

chrboe commented 5 months ago

Yes, unfortunately Debian stable (so pve too) is affected by this.

We do have a bug report with Debian in progress to hopefully get this patched, but there has not been much activity yet.

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1067733

Unfortunately there's not much else we can do, it's in the distros hands. Maybe it helps if you also complain to Debian 🙂

A quick fix would of course be to just overwrite the portblock resource agent script in /usr/lib/ocf/... with the patched version from GitHub.

synq commented 4 months ago

So do this on all your nodes:

~# git clone https://github.com/ClusterLabs/resource-agents.git
Cloning into 'resource-agents'...
remote: Enumerating objects: 63268, done.
remote: Counting objects: 100% (345/345), done.
remote: Compressing objects: 100% (183/183), done.
remote: Total 63268 (delta 190), reused 301 (delta 162), pack-reused 62923
Receiving objects: 100% (63268/63268), 32.73 MiB | 17.55 MiB/s, done.
Resolving deltas: 100% (41918/41918), done.
~# cp resource-agents/heartbeat/portblock /usr/lib/ocf/resource.d/heartbeat/portblock 

Just to be sure everything was reloaded I rebooted all nodes one by one (that is probably not needed).

RumenBlack84 commented 3 months ago

So do this on all your nodes:

~# git clone https://github.com/ClusterLabs/resource-agents.git
Cloning into 'resource-agents'...
remote: Enumerating objects: 63268, done.
remote: Counting objects: 100% (345/345), done.
remote: Compressing objects: 100% (183/183), done.
remote: Total 63268 (delta 190), reused 301 (delta 162), pack-reused 62923
Receiving objects: 100% (63268/63268), 32.73 MiB | 17.55 MiB/s, done.
Resolving deltas: 100% (41918/41918), done.
~# cp resource-agents/heartbeat/portblock /usr/lib/ocf/resource.d/heartbeat/portblock 

Just to be sure everything was reloaded I rebooted all nodes one by one (that is probably not needed).

Thanks this workaround fixed my problem!