Closed cfgamboa closed 2 years ago
Hi Carlos,
Sorry, I'm a little lost at what is going wrong.
Could you elaborate a little more on what's going wrong? Perhaps providing a "walk though" of the problem (describing the interactions between the client and the door, or between different dCache services) would help.
Cheers, Paul.
Incidentally, there will be a new dCache release 7.2.11 "soon" that will contain a fix for the pool so that the pool will try to use an IPv6 address for HTTP-TPC. This change does not involve setting the java.net.preferIPv6Addresses
system property.
So, if you set the java.net.preferIPv6Address
property with the intention to use IPv6 for HTTP-TPC then an alternative solution is to upgrade your pools to 7.2.11 (or newer).
That said, we should really understand what is going wrong here, as future versions of dCache (8.1 and newer) will set java.net.preferIPv6Addresses
, albeit with the value system
(rather than true
).
HTH, Paul.
Hello Paul,
The java.net.preferIPv6Address
was enabled on DMZ pools. Internal pools do not have external accessibility, traffic to those pools need to be forced to use the internal network interface.
See the following example for failing transfers
Here the door: (dcdoor16) The DMZ pool is: dcdoor16_1 The internal pool is : dc220_7
This is the external Interface
[root@dcdoor16 ~]# getent hosts $(uname -n) | awk '{print $1}'
2620:0:210:1::ed
03.02 15:14:38 [pool:dc220_7:transfer] [0000281CB5ABC03044ECB315FF5DE86112E6,10422758865] [Unknown] bnlt0d1:BNLT0D1@osm 0 17 false {Http-1.1:2620:0:210:1:0:0:0:ed:0:dcdoor16_1:dcdoor16oneDomain:/0000281CB5ABC03044ECB315FF5DE86112E6} [pool:dcdoor16_1@dcdoor16oneDomain] {666:"General problem: Unable to find address that faces dcdoor16.usatlas.bnl.gov/2620:0:210:1:0:0:0:ed"}
03.02 15:14:38 [door:RemoteTransferManager@srm-dcsrm03Domain:request] ["usatlas1":6435:31152:unknown] [0000281CB5ABC03044ECB315FF5DE86112E6,0] [/pnfs/usatlas.bnl.gov/BNLT0D1/rucio/data18_13TeV/30/ec/DAOD_SUSY15.20159457._000035.pool.root.1] bnlt0d1:BNLT0D1@osm 133 133 {666:"Failed to select pool: CacheException(rc=666;msg=General problem: Unable to find address that faces dcdoor16.usatlas.bnl.gov/2620:0:210:1:0:0:0:ed)"}
03.02 15:19:38 [pool:dc220_7:transfer] [0000281CB5ABC03044ECB315FF5DE86112E6,10422758865] [Unknown] bnlt0d1:BNLT0D1@osm 0 17 false {Http-1.1:2620:0:210:1:0:0:0:ed:0:dcdoor16_1:dcdoor16oneDomain:/0000281CB5ABC03044ECB315FF5DE86112E6} [pool:dcdoor16_1@dcdoor16oneDomain] {666:"General problem: Unable to find address that faces dcdoor16.usatlas.bnl.gov/2620:0:210:1:0:0:0:ed"}
I enabled yesterday the webdav.net.internal
to force the door use the IPV4 internal network. It appears that did not work as
the error persisted.
All the best, Carlos
Hello,
A related question is whether the two protocols are supported (IPV4 and IPV6) by the pools. The internal pools are
dcache.java.options.extra="-Djava.net.preferIPv4Stack=true"
All the best, Carlos
Hi,
Unfortunately, (as you discovered) the value webdav.net.internal
doesn't have an effect here.
This configuration property controls the interface the door uses when it is proxying the transfer. The problem you're facing is that the pool-to-pool transfer (from a non-DMZ pool to a DMZ pool) is not working because the DMZ pool is selecting an IPv6 address, to which the non-DMZ pool cannot connect.
My suggestion would be to update your dCache configuration so you no longer set these system properties (java.net.preferIPv6Addresses
and java.net.preferIPv4Stack
) on the DMZ pools. This should return you back to a working dCache instance.
Like this, HTTP-TPC will use an IPv4 address if the remote party supports IPv4. (This will work, but it's not what WLCG would like to see.)
Next, you should upgrade your DMZ nodes to dCache v7.2.11 once this is released. This should be released today, but there's currently a fault with our build infrastructure.
Upgrading to 7.2.11 will result in DMZ pools using IPv6 if the remote party supports IPv6. This is without requiring you to set any system properties.
Note: this doesn't solve the problem you described. You will face this problem when upgrading to dCache v8.1 or newer. However, it buys us some time to fix it.
Cheers, Paul.
Hello @paulmillar
FYI, last week we have rolled back the change java.net.preferIPv6Addresses on DMZ pools.
All the best, Carlos
Great. I guess that "resolves" the problem, at least for now.
Upgrading the DMZ pool nodes to dCache v7.2.11 (or newer) should result in HTTP-TPC transfers preferring IPv6 if the remote server supports it.
Note that, as is, this problem will reappear when you upgrade to dCache v8.1.0 (or newer). We will need to better understand the problem and fix it more generally.
Hi @cfgamboa ,
One think that you should keep in mind, that IPv6 always has a priority when a connection should be established. Thus, an IPv6 enabled interface with link-local only address (no public IPv6 address) will cause lot of trouble. To fix that, you need to completely disable IPv6 stack on the host
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
Can you give it a try?
Thanks, Tigran.
Hello @kofemann
Thank you for your feedback. I am trying to undertand so you are proposing to disable IPV6 for all transfer on the host? Or for the pools/hosts?
All the best, Carlos
Problem understood and rectified.
Hello,
Doors dcache-7.2.7-1.noarch
Pools dcache-7.2.3
The DMZ pool is mainly used via DAV/HTTP traffic. The DMZ pool was set with
dcache.java.options.extra = -Djava.net.preferIPv6Addresses=true
It seems that this change forces requests involving the DMZ pool to use the IPV6. There are internal pools that do not support IPV6 and are only accessible internally.
It appears that enabling webdav.net.internal to the IPV4 interface does not help. There are traffic from the DMZ pool with requesting the transfer via the IPV6 external interface.
Is there any way to configure the DMZ pool to use a specific interface to contact the internal pools. Similar to what is done for webdav.net.internal for the DAV doors?
All the best, Carlos