dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
285 stars 136 forks source link

Xootd IPV6 transfer fails with (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source) #6605

Closed cfgamboa closed 2 years ago

cfgamboa commented 2 years ago

Hello,

IPV6 transfer to the BNL DUNE instance is failing while using Xrootd protocol. Other protocols like DAV works fine with IPV6 and IPV4. The error reported here is different to the one reported on https://github.com/dCache/dcache/issues/6602

The following is the transfer test:

[cgamboa@lxplus730 ~]$ xrdcp -f -DSNetworkStack IPv6 root://dcdndoor01.sdcc.bnl.gov:1096/pnfs/sdcc.bnl.gov/data/dune/1gbtestfile.20210413_6b /tmp/IPV6DUNE
[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Server responded with an error: [3012] Failed to open file (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source)

The XRoot door log

22 Apr 2022 17:46:28 [pool-7-thread-8] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRShqgkA] Redirecting to dcdn001.sdcc.bnl.gov, dcdn001.sdcc.bnl.gov/10.38.5.10:33207
22 Apr 2022 17:46:28 [pool-7-thread-8] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRShqgkA] Sending the following host information to the client: dcdn001.sdcc.bnl.gov
22 Apr 2022 17:46:58 [xrootd-net-4] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRSp6ZuA] Time since last server cert refresh 34577
22 Apr 2022 17:46:59 [xrootd-net-4] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRSp6ZuA] Client protocol version was 10600, using GSIPost49ServerRequestHandler.
22 Apr 2022 17:46:59 [xrootd-net-4] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRSp6ZuA] Time since last server cert refresh 35659
22 Apr 2022 17:47:00 [xrootd-net-4] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRSp6ZuA] Using padded DH secret generation.
22 Apr 2022 17:47:00 [pool-7-thread-8] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRSp6ZuA] Opening pnfs/sdcc.bnl.gov/data/dune/1gbtestfile.20210413_6b for read
22 Apr 2022 17:47:00 [pool-7-thread-8] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRSp6ZuA] OPAQUE : {}
22 Apr 2022 17:47:00 [Xrootd-dcdndoor01-0] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRSp6ZuA dcdn001_1 DoorTransferFinished 0000DB68763F43B54FA2B6AAEE9ED2287B53] Transfer 0000DB68763F43B54FA2B6AAEE9ED2287B53@PoolName=dcdn001_1 PoolAddress=dcdn001_1@dcdn001oneDomain failed: General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 (error code=666)
22 Apr 2022 17:47:00 [pool-7-thread-8] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRSp6ZuA] Xrootd-Error-Response: ErrorNr=3012 ErrorMsg=Failed to open file (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666])

I have attached the entire log for the xrootd transfer -d 3

The dune door is resolve as:


[cgamboa@lxplus730 ~]$ nslookup dcdndoor01.sdcc.bnl.gov
Server:     127.0.0.1
Address:    127.0.0.1#53

Non-authoritative answer:
Name:   dcdndoor01.sdcc.bnl.gov
Address: 192.33.128.30
Name:   dcdndoor01.sdcc.bnl.gov
Address: 2620:0:210:2::1e

dcache on doors 7.2.15, other components are 7.2.1

Currently the pools are local to BNL so it is expected the transfer fails as the pools are not externally accessible. However, if using the IPV4 protocol to request the transfer the error is different to the one reported.

Here the error is related to the redirection.

[cgamboa@lxplus730 ~]$ xrdcp -f -DSNetworkStack IPv4 root://dcdndoor01.sdcc.bnl.gov:1096/pnfs/sdcc.bnl.gov/data/dune/1gbtestfile.20210413_6b /tmp/IPV4DUNE
[0B/0B][100%][==================================================][0B/s]  
Run: [FATAL] Redirect limit has been reached:  (source)

From the door log is shown that there is a redirection to the pool (which is not externally accessible)

22 Apr 2022 18:04:07 [pool-7-thread-14] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRWeDdoA] Redirecting to dcdn001.sdcc.bnl.gov, dcdn001.sdcc.bnl.gov/10.38.5.10:33217

So the question is why IPV6 attempt does not have the same behavior than IPV4. I was expecting that both IP protocols should fail with the same error. Redirect limit has been reached: (source)

Thank you,

Carlos

alrossi commented 2 years ago

@cfgamboa @DmitryLitvintsev

Carlos,

I do believe this is related to both #6568 and #6602.

I suggest we tackle one issue at a time. I would suggest starting with getting #6568 to work as we suggested.

If you need us to zoom again with you, please let us know.

Thanks, Al

cfgamboa commented 2 years ago

Hello Al,

I think a zoom meeting will be go to touch base again. Tomorrow I will be able to meet in the afternoon 2-4pm

Yes I think is a good idea to tackle one at the time. Please note that I have created the record on GitHub while stepping on those so we can trace them.

All the best, Carlos

On May 4, 2022, at 10:10 AM, Albert Rossi @.***> wrote:

@cfgamboa https://github.com/cfgamboa @DmitryLitvintsev https://github.com/DmitryLitvintsev Carlos,

I do believe this is related to both #6658 and #6602 https://github.com/dCache/dcache/issues/6602.

I suggest we tackle one issue at a time. I would suggest starting with getting #6658 to work as we suggested.

If you need us to zoom again with you, please let us know.

Thanks, Al

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6605#issuecomment-1117361125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMO6HUHDWGOPN55LD7XDVIKAMJANCNFSM5UDRGFAQ. You are receiving this because you were mentioned.

alrossi commented 2 years ago

(I'm not sure this reply got through, so I'm posting it again here.)

Hello Carlos,

I would like Dmitry to be present as well, since he has contributed to these discussions and may have some further insight from production here at FNAL.

Today we have an operations meeting at 13,00 CDT (14,00 EDT). It might be feasible to meet after that (14,00 CDT / 15,00 EDT).

I don't recall what the zoom connection was ... did you provide it, or did Dmitry?

Cheers,

cfgamboa commented 2 years ago

Hello Al,

3PM (EDT) is fine with me, I will sent you all the zoom link vi RT ticket number www.dcache.org #10308 once Dmitry confirms.

All the best, Carlos

On May 5, 2022, at 9:49 AM, Albert Rossi @.***> wrote:

(I'm not sure this reply got through, so I'm posting it again here.)

Hello Carlos,

I would like Dmitry to be present as well, since he has contributed to these discussions and may have some further insight from production here at FNAL.

Today we have an operations meeting at 13,00 CDT (14,00 EDT). It might be feasible to meet after that (14,00 CDT / 15,00 EDT).

I don't recall what the zoom connection was ... did you provide it, or did Dmitry?

Cheers,

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6605#issuecomment-1118578135, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMO65KC4G2BOY245FC7TVIPGV3ANCNFSM5UDRGFAQ. You are receiving this because you were mentioned.

DmitryLitvintsev commented 2 years ago

Hi Carlos,

yeah, I confirm. Go ahead and send the link. But just to make sure, we will be discussing your pool manager DMZ issue not the issue on this ticket. My goal is : BNL has to have a poolmanager setup that allows hop from internal to DMZ pools for transfers. Because this seems the proper solution for DMZ/internal separation.

Dmitry


From: gamboa @.***> Sent: Thursday, May 5, 2022 8:53 AM To: dCache/dcache Cc: Dmitry O Litvintsev; Mention Subject: Re: [dCache/dcache] Xootd IPV6 transfer fails with (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source) (Issue #6605)

Hello Al,

3PM (EDT) is fine with me, I will sent you all the zoom link vi RT ticket number www.dcache.orghttps://urldefense.proofpoint.com/v2/url?u=http-3A__www.dcache.org&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=2U7LvxXJgiCxMM6xBeOUEgyOY_O3vI8lrSFSRu0oFBp0dtvGc9uNjfE4VRMCWPVw&s=tmFvdIO5O7QV1k0KQ3xMXkjqlbvzCQM_rlMi67vSGgQ&e= #10308 once Dmitry confirms.

All the best, Carlos

On May 5, 2022, at 9:49 AM, Albert Rossi @.***> wrote:

(I'm not sure this reply got through, so I'm posting it again here.)

Hello Carlos,

I would like Dmitry to be present as well, since he has contributed to these discussions and may have some further insight from production here at FNAL.

Today we have an operations meeting at 13,00 CDT (14,00 EDT). It might be feasible to meet after that (14,00 CDT / 15,00 EDT).

I don't recall what the zoom connection was ... did you provide it, or did Dmitry?

Cheers,

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6605#issuecomment-1118578135https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6605-23issuecomment-2D1118578135-253E&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=2U7LvxXJgiCxMM6xBeOUEgyOY_O3vI8lrSFSRu0oFBp0dtvGc9uNjfE4VRMCWPVw&s=QspnE-TCl_E2U2okkWAmKct3rXBFhGy645j8pHhQxDg&e=, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMO65KC4G2BOY245FC7TVIPGV3ANCNFSM5UDRGFAQhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHIHMO65KC4G2BOY245FC7TVIPGV3ANCNFSM5UDRGFAQ-253E&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=2U7LvxXJgiCxMM6xBeOUEgyOY_O3vI8lrSFSRu0oFBp0dtvGc9uNjfE4VRMCWPVw&s=xi-ZyvyS_fdJz7ZyL_joXwg5ZLQHTBUw-4MGjDwPax0&e=. You are receiving this because you were mentioned.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6605-23issuecomment-2D1118582949&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=2U7LvxXJgiCxMM6xBeOUEgyOY_O3vI8lrSFSRu0oFBp0dtvGc9uNjfE4VRMCWPVw&s=tpIHmqKVwbpwS45pKxPOZQfIvkdAB3NzZJA52S9RRtc&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAP6VB6JCC6DW7A5XMLN2KTVIPHE7ANCNFSM5UDRGFAQ&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=2U7LvxXJgiCxMM6xBeOUEgyOY_O3vI8lrSFSRu0oFBp0dtvGc9uNjfE4VRMCWPVw&s=-vPUzjzQ46f7E3KL7qdGl6CaarJX7tFD5e0DF9q4x5A&e=. You are receiving this because you were mentioned.Message ID: @.***>

alrossi commented 2 years ago

OK, so in the case of IPv6, the pool, which is trying to send its address to the door, cannot find its own local address that is accessible by the client. It is the same problem but in reverse. The IPv6 issue simply masks the redirect issue because it happens when the mover is started (before the reply to the client to redirect to the pool).

I'm wondering whether defining link local is what you need on the pools (this is a wild guess).

DmitryLitvintsev commented 2 years ago

I suggested to add IP/host resolution to '"/etc/hosts" on pool nodes like we do at Fermilab Has this been tried?


From: Albert Rossi @.***> Sent: Thursday, May 5, 2022 12:01 PM To: dCache/dcache Cc: Dmitry O Litvintsev; Mention Subject: Re: [dCache/dcache] Xootd IPV6 transfer fails with (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source) (Issue #6605)

OK, so in the case of IPv6, the pool, which is trying to send its address to the door, cannot find its own local address that is accessible by the client. It is the same problem but in reverse. The IPv6 issue simply masks the redirect issue because it happens when the mover is started (before the reply to the client to redirect to the pool).

I'm wondering whether defining link local is what you need on the pools (this is a wild guess).

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6605-23issuecomment-2D1118838121&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=GBco--fVALFErpIEA69Gt30gjq-x6IyTq2RUwdm_LQeGWs9MedPwpOzL60ntRU2k&s=iWP2vuyXpmXX_LCcGuy9Z1us0qqeF-ZUAZTUoy6514M&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAP6VB65F4ACGJLIPOXXZJTVIP5FTANCNFSM5UDRGFAQ&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=GBco--fVALFErpIEA69Gt30gjq-x6IyTq2RUwdm_LQeGWs9MedPwpOzL60ntRU2k&s=W_0q1tINiwaLHY3VjRkM7LysA8-7y8oBsE2NeXb4npU&e=. You are receiving this because you were mentioned.Message ID: @.***>

cfgamboa commented 2 years ago

Yes

On May 5, 2022, at 1:17 PM, Dmitry Litvintsev @.***> wrote:

I suggested to add IP/host resolution to '"/etc/hosts" on pool nodes like we do at Fermilab Has this been tried?


From: Albert Rossi @.***> Sent: Thursday, May 5, 2022 12:01 PM To: dCache/dcache Cc: Dmitry O Litvintsev; Mention Subject: Re: [dCache/dcache] Xootd IPV6 transfer fails with (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source) (Issue #6605)

OK, so in the case of IPv6, the pool, which is trying to send its address to the door, cannot find its own local address that is accessible by the client. It is the same problem but in reverse. The IPv6 issue simply masks the redirect issue because it happens when the mover is started (before the reply to the client to redirect to the pool).

I'm wondering whether defining link local is what you need on the pools (this is a wild guess).

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6605-23issuecomment-2D1118838121&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=GBco--fVALFErpIEA69Gt30gjq-x6IyTq2RUwdm_LQeGWs9MedPwpOzL60ntRU2k&s=iWP2vuyXpmXX_LCcGuy9Z1us0qqeF-ZUAZTUoy6514M&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAP6VB65F4ACGJLIPOXXZJTVIP5FTANCNFSM5UDRGFAQ&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=GBco--fVALFErpIEA69Gt30gjq-x6IyTq2RUwdm_LQeGWs9MedPwpOzL60ntRU2k&s=W_0q1tINiwaLHY3VjRkM7LysA8-7y8oBsE2NeXb4npU&e=. You are receiving this because you were mentioned.Message ID: @.***> — Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6605#issuecomment-1118851636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMO2VI7DN5MF5ENFIENDVIP7CBANCNFSM5UDRGFAQ. You are receiving this because you were mentioned.

DmitryLitvintsev commented 2 years ago

I think, though, that needs to be tried after files can be delivered from DMZ pools. "Unable to find address that faces". In this scenario the replica of file you are trying to access is on "internal" pool. IMO this is compounding the issue.

So first access to DMZ needs to be resolved. Then IPv6 looked at.


From: gamboa @.***> Sent: Thursday, May 5, 2022 12:18 PM To: dCache/dcache Cc: Dmitry O Litvintsev; Mention Subject: Re: [dCache/dcache] Xootd IPV6 transfer fails with (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source) (Issue #6605)

Yes

On May 5, 2022, at 1:17 PM, Dmitry Litvintsev @.***> wrote:

I suggested to add IP/host resolution to '"/etc/hosts" on pool nodes like we do at Fermilab Has this been tried?


From: Albert Rossi @.***> Sent: Thursday, May 5, 2022 12:01 PM To: dCache/dcache Cc: Dmitry O Litvintsev; Mention Subject: Re: [dCache/dcache] Xootd IPV6 transfer fails with (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source) (Issue #6605)

OK, so in the case of IPv6, the pool, which is trying to send its address to the door, cannot find its own local address that is accessible by the client. It is the same problem but in reverse. The IPv6 issue simply masks the redirect issue because it happens when the mover is started (before the reply to the client to redirect to the pool).

I'm wondering whether defining link local is what you need on the pools (this is a wild guess).

— Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6605-23issuecomment-2D1118838121&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=GBco--fVALFErpIEA69Gt30gjq-x6IyTq2RUwdm_LQeGWs9MedPwpOzL60ntRU2k&s=iWP2vuyXpmXX_LCcGuy9Z1us0qqeF-ZUAZTUoy6514M&e=%3E, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAP6VB65F4ACGJLIPOXXZJTVIP5FTANCNFSM5UDRGFAQ&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=GBco--fVALFErpIEA69Gt30gjq-x6IyTq2RUwdm_LQeGWs9MedPwpOzL60ntRU2k&s=W_0q1tINiwaLHY3VjRkM7LysA8-7y8oBsE2NeXb4npU&e=%3E. You are receiving this because you were mentioned.Message ID: @.***> — Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6605#issuecomment-1118851636https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6605-23issuecomment-2D1118851636-253E&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=9gP_-7V1f6iT3mOqHSYrQhf7z-XaWPMjW-12JM46j0-i5iOC8pr_YuEFEuxnDlhU&s=NRjDvObKWZysfXRd67ph6cmlssRu1MEKIP7gJfkcRSE&e=, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMO2VI7DN5MF5ENFIENDVIP7CBANCNFSM5UDRGFAQhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHIHMO2VI7DN5MF5ENFIENDVIP7CBANCNFSM5UDRGFAQ-253E&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=9gP_-7V1f6iT3mOqHSYrQhf7z-XaWPMjW-12JM46j0-i5iOC8pr_YuEFEuxnDlhU&s=BLhZZoOnBj7mRdyj1-rhfjVu2xpk_IdFgRrMJkdEbfI&e=. You are receiving this because you were mentioned.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6605-23issuecomment-2D1118852331&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=9gP_-7V1f6iT3mOqHSYrQhf7z-XaWPMjW-12JM46j0-i5iOC8pr_YuEFEuxnDlhU&s=ygOPUVsJD-0idUeYBG0tNa4NamgCCGdtrARoWo5O_G0&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAP6VB3FUD4FYG7JAZBLR43VIP7FHANCNFSM5UDRGFAQ&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7PHi3TDlwkvpc07MjENbOxVFl0u_sEurf250JnUFWCU&m=9gP_-7V1f6iT3mOqHSYrQhf7z-XaWPMjW-12JM46j0-i5iOC8pr_YuEFEuxnDlhU&s=kp_rVYbh-QWzsZlhwTXarPBg3burKSk1tVGQVTuVozg&e=. You are receiving this because you were mentioned.Message ID: @.***>

cfgamboa commented 2 years ago

@DmitryLitvintsev @alrossi

Besides the issue of not been able to contact the internal pool. Currently that pool (internal) do not support IPV6, so it is likely that this error:

Run: [ERROR] Server responded with an error: [3012] Failed to open file (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source)

Is due to this, on contrary: I enabled the DMZ READ/WRITE to internal pool configuration explored for QOS on dCache RT( www.dcache.org #10308) on this DUNE instance. I was able to READ a file from the dune

[cgamboa@lxplus721 ~]$ xrdcp -f root://dcdndoor01.sdcc.bnl.gov:1096/pnfs/sdcc.bnl.gov/data/dune/1gbtestfile.20210413_6b /dev/null
[976.6MB/976.6MB][100%][==================================================][25.04MB/s]  

For IPV6

[cgamboa@lxplus721 ~]$ xrdcp -f -DSNetworkStack IPv6 root://dcdndoor01.sdcc.bnl.gov:1096/pnfs/sdcc.bnl.gov/data/dune/1gbtestfile.20210413_6b /dev/null
[976.6MB/976.6MB][100%][==================================================][27.9MB/s]  

Works, here the DMZ pool will handle the transfer read, the DMZ pool has IPV6 and IPV4 capability. So here the error reported here is not longer observed.

All the best, Carlos

cfgamboa commented 2 years ago

@DmitryLitvintsev @alrossi

The dune pool host now support IPV6. I used a standard endpoint (NO DMZ redirection) and I could confirm that this issue

Failed to open file (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source)

Here an example for a transfer attempting to use the internal pool for redirection.

[cgamboa@lxplus716 ~]$ xrdcp -f -DSNetworkStack IPv6 root://dcdndoor01.sdcc.bnl.gov:1096/pnfs/sdcc.bnl.gov/data/test/test_root.txt  /tmp/IPV6DUNE
[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Invalid redirect URL:  (source)

Here the transfer fails as it is expected as the pool do not have external connectivity.

All the best, Carlos

alrossi commented 2 years ago

Carlos,

I'm just a bit confused. Could you perhaps clarify if the last entry is reporting a problem or whether the behavior is as you expect?

thanks, Al

cfgamboa commented 2 years ago

Hi Al,

The issue that initially was reported here was only affecting XROOT.

If the transfer initiated an IPV6 request and the pool did not support IPV6 the transfer failed. So after enabling IPV6 on the pool the issue reported is not longer observed.

To me the XROOT client should not attempt to use the IPV6 if the protocol is not supported on the pool.

For this specific error the issue seems to be understood.

All the best, Carlos

On May 26, 2022, at 12:04 PM, Albert Rossi @.***> wrote:

Carlos,

I'm just a bit confused. Could you perhaps clarify if the last entry is reporting a problem or whether the behavior is as you expect?

thanks, Al

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6605#issuecomment-1138735971, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMO4KVMDUXHDGBADCYDTVL6OJJANCNFSM5UDRGFAQ. You are receiving this because you were mentioned.

alrossi commented 2 years ago

Hi Carlos,

So the problem is with the client? That is, before you enabled IPv6 on the pool, the door was indicating to the client to use IPv4 with the pool to which it was redirected, and it was still using IPv6?

If so, then this is out of dCache's hands ...

Thanks, Al

cfgamboa commented 2 years ago

Hi Al,

Before the IPV6 on the internal POOL the error was triggered:

  1. General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source)

After enabling IPV6 in the internal pool the issue disappeared.

Note the issue 1. was not observed when using the DMZ READ/WRITE as these pools have IPV6 enabled.

All the best, Carlos

On May 26, 2022, at 1:59 PM, Albert Rossi @.***> wrote:

Hi Carlos,

So the problem is with the client? That is, before you enabled IPv6 on the pool, the door was indicating to the client to use IPv4 with the pool to which it was redirected, and it was still using IPv6?

If so, then this is out of dCache's hands ...

Thanks, Al

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6605#issuecomment-1138862327, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMO3MZ53NMUT6UHA5CGLVL63Y3ANCNFSM5UDRGFAQ. You are receiving this because you were mentioned.

alrossi commented 2 years ago

So what we need to see again is the door logging from before the enabling of IPv6, because the client is simply using the address the door gives it.

It seems to me that the question is: Why is the door reporting an IPv6 address for the pool when the pool is not enabled with IPv6.

alrossi commented 2 years ago

I think the pool itself reports its address to the door. And the question then again is: why does the pool give the wrong address?

(I seem to be repeating myself ... see my comment above :-) )

cfgamboa commented 2 years ago

This is the log for the case when IPV6 is enabled. For the only IPV4 it was already reported

26 May 2022 14:34:53 [xrootd-net-35] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Time since last server cert refresh 11627579
26 May 2022 14:34:53 [xrootd-net-35] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Loading server certificates. Current refresh interval: 60000 ms
26 May 2022 14:34:53 [xrootd-net-35] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Verifying host certificate
26 May 2022 14:34:54 [xrootd-net-35] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Client protocol version was 10600, using GSIPost49ServerRequestHandler.
26 May 2022 14:34:54 [xrootd-net-35] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Time since last server cert refresh 157
26 May 2022 14:34:54 [xrootd-net-35] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Using padded DH secret generation.
26 May 2022 14:34:54 [pool-7-thread-76] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Opening pnfs/sdcc.bnl.gov/data/test/test_root.txt for read
26 May 2022 14:34:54 [pool-7-thread-76] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] OPAQUE : {}
26 May 2022 14:34:54 [pool-7-thread-76] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Unable to resolve IP address 2620:0:210:880a:0:0:0:10 to a canonical name
26 May 2022 14:34:54 [pool-7-thread-76] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Redirecting to 2620:0:210:880a:0:0:0:10, 2620:0:210:880a:0:0:0:10/2620:0:210:880a:0:0:0:10:33315
26 May 2022 14:34:54 [pool-7-thread-76] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Sending the following host information to the client: 2620:0:210:880a:0:0:0:10

All the best, Carlos

cfgamboa commented 2 years ago

Since the pool only supportted IPV4 the transfer should have used IPV4.

DmitryLitvintsev commented 2 years ago

Carlos,

in this case adding IPV4 -> name resolution "/etc/hosts" would have helped. I think.

Dmitry

cfgamboa commented 2 years ago

It was enbled

alrossi commented 2 years ago

Since the pool only supportted IPV4 the transfer should have used IPV4.

I agree. The problem I'm trying to determine is whether this is an xrootd (SLAC) issue or a dCache one.

alrossi commented 2 years ago

Carlos,

It is my opinion that the problem still lies with your system's network configuration.

In the above case of IPv4, you see:

22 Apr 2022 17:46:28 [pool-7-thread-8] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRShqgkA] Redirecting to dcdn001.sdcc.bnl.gov, dcdn001.sdcc.bnl.gov/10.38.5.10:33207
22 Apr 2022 17:46:28 [pool-7-thread-8] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXdRShqgkA] Sending the following host information to the client: dcdn001.sdcc.bnl.gov

but in the case of IPv6, you see:

26 May 2022 14:34:54 [pool-7-thread-76] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Unable to resolve IP address 2620:0:210:880a:0:0:0:10 to a canonical name
26 May 2022 14:34:54 [pool-7-thread-76] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Redirecting to 2620:0:210:880a:0:0:0:10, 2620:0:210:880a:0:0:0:10/2620:0:210:880a:0:0:0:10:33315
26 May 2022 14:34:54 [pool-7-thread-76] [door:Xrootd-dcdndoor01@xrootd-dcdndoor01Domain:AAXf7nIlAwA] Sending the following host information to the client: 2620:0:210:880a:0:0:0:10

The door is telling you there is something wrong with the DNS lookup.

Now compare this to one of the FNAL doors:

26 May 2022 14:40:11 [pool-8-thread-402262] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7UGiClg] Redirecting to stkendca1812.fnal.gov, stkendca1812.fnal.gov/131.225.69.91:22092
26 May 2022 14:40:11 [pool-8-thread-402262] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7UGiClg] Sending the following host information to the client: stkendca1812.fnal.gov
26 May 2022 14:40:11 [pool-8-thread-402283] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7FUgATA] Redirecting to stkendca1909.fnal.gov, stkendca1909.fnal.gov/2620:6a:0:4812:f0:0:69:112:24915
26 May 2022 14:40:11 [pool-8-thread-402283] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7FUgATA] Sending the following host information to the client: stkendca1909.fnal.gov
26 May 2022 14:40:11 [pool-8-thread-402265] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7yQApKg] Redirecting to stkendca1913.fnal.gov, stkendca1913.fnal.gov/2620:6a:0:4812:f0:0:69:116:20094
26 May 2022 14:40:11 [pool-8-thread-402265] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7yQApKg] Sending the following host information to the client: stkendca1913.fnal.gov
26 May 2022 14:40:11 [pool-8-thread-402092] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf70Fx57g] Redirecting to stkendca1910.fnal.gov, stkendca1910.fnal.gov/131.225.69.113:20648
26 May 2022 14:40:11 [pool-8-thread-402092] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf70Fx57g] Sending the following host information to the client: stkendca1910.fnal.gov
26 May 2022 14:40:11 [pool-8-thread-402272] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7jal/mA] Redirecting to stkendca1802.fnal.gov, stkendca1802.fnal.gov/131.225.69.81:20670
26 May 2022 14:40:11 [pool-8-thread-402272] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7jal/mA] Sending the following host information to the client: stkendca1802.fnal.gov
26 May 2022 14:40:11 [pool-8-thread-402250] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7aYDOtA] Redirecting to stkendca1909.fnal.gov, stkendca1909.fnal.gov/2620:6a:0:4812:f0:0:69:112:24915
26 May 2022 14:40:11 [pool-8-thread-402250] [door:Xrootd00-fndca4b@xrootd-fndca4bDomain:AAXf7aYDOtA] Sending the following host information to the client: stkendca1909.fnal.gov

From the above, you will see that IPv6 addresses are equally well resolved to the hostname as are IPv4.

The client should, in any case, be receiving a hostname, not an address. This is because of the requirement that IP addresses be in the alternate names of the host certificate in order for TLS to work. Rather than obligating sites to do this for all their certs, it was decided that xroot always attempt to return the host name. Here is the code:

            /*
             * ok, open was successful
             */
            InetSocketAddress address = transfer.getRedirect();

            /*
             *  Do not use the IP address as host name, as this will block
             *  TLS from working.
             *
             *  According to https://tools.ietf.org/html/rfc5280#section-4.2.1.6
             *  an IP is required to be in the list of Subject Alternative Names
             *  in the host certificate, but these are rarely added in practice.
             *  TLS enforces the RFC and this is a workaround.
             */
            String host = address.getHostName();
            if (InetAddresses.isInetAddress(host)) {
                _log.warn("Unable to resolve IP address {} "
                      + "to a canonical name", host);
            }

            _log.info("Redirecting to {}, {}", host, address);

            return new RedirectResponse<>(req, host, address.getPort(),

The irony here is that, peculiar to your setup, the address works (IPv6), but the hostname doesn't.

But this seems to be symptomatic of a larger configuration issue. The Java lookup is confused for the IPv6 address on your door and can't find a corresponding hostname for it.

So, in any case, I really do not think this is a specifically dCache issue/bug.

Cheers, Al

cfgamboa commented 2 years ago

Hi Al,

I think we already discussed this on https://github.com/dCache/dcache/issues/6602 the client does manage to use an IPV4 address and IPV6 ipaddress if given. Here I found out that you could provide the client the IPV6 address of the pool by setting it in the /etc/hosts file. The issue here is different. The issue here is that when the pool only supports IPV4 interface the client should not attempt to redirect to an IPV6. Should either fail or redirect using the IPV4 interface.

All the best, Carlos

alrossi commented 2 years ago

Carlos,

You seem to be implying, then, that this is ultimately a client problem (i.e., not dCache). Have I misunderstood?

My only question then would be how the client could know the pool only supports IPv4 if it is given a hostname as endpoint? This is what is happening, no? If it gets an IPv4 address, it works, if I understand your cases/examples in #6602 correctly. But if it gets a hostname, it fails.

Are you saying that the client itself maps the hostname it gets to an IPv6 address? Do you actually see that happening in the client logs? (I apologize if you included this information earlier ... things have become a bit muddled over the past month for me on this issue.)

If so, that is strange, because I was under the impression that the client would not do DNS lookup of its own initiative, because it is insecure (at least that is the SLAC development philosophy); but I may be wrong here ... in any case, how would it know not to use the IPv6 address in such a case?

In any event, can this issue be closed?

Thanks, Al

cfgamboa commented 2 years ago

Hi Al,

From xrdcp client there is a flag that forces the transfer to use IPV4. This issue is not observed there. I think the client should be able to do this automatically, select only the method (IPV6|IPV4) that is supported by the pool being used to redirect the transfer.

If this is not a dcache issue it self then this issue could be close.

All the best, Carlos

alrossi commented 2 years ago

OK, Carlos.

I will close this issue then. But I was wondering, before I do, just for my own curiosity. Would you be able to repeat the situation where the transfer fails because the pool is IPv4 but the client uses IPv6, but do it with xrdcp debug on? i.e.,

xrdcp -d 3 ...

And then send me the full log for that failed transfer?

In particular, I'm looking for when the client establishes the connection to the pool.

e.g.,

[2022-05-27 09:11:49.910631 -0500][Debug  ][PostMaster        ] [stkendca1911.fnal.gov:23912] Found 1 address(es): [::ffff:131.225.69.114]:23912

But seeing the full log would be useful to me to understand what the client thinks it is doing.

You can attach that log to the old RT ticket (which we should probably also close).

Thank you so much!

Al

cfgamboa commented 2 years ago

Hi Al,

Since now the pool supports both protocols I can not reproduce it with this instance.

All the best, Carlos

alrossi commented 1 year ago

Hello Carlos,

I would like Dmitry to be present as well, since he has contributed to these discussions and may have some further insight from production here at FNAL.

Today we have an operations meeting at 13,00 CDT (14,00 EDT). It might be feasible to meet after that (14,00 CDT / 15,00 EDT).

I don't recall what the zoom connection was ... did you provide it, or did Dmitry?

Cheers,

Al


From: gamboa @.> Sent: Wednesday, May 4, 2022 5:01 PM To: dCache/dcache @.> Cc: Albert Rossi @.>; Comment @.> Subject: Re: [dCache/dcache] Xootd IPV6 transfer fails with (General problem: Unable to find address that faces lxplus730.cern.ch/2001:1458:d00:12:0:0:0:270 [666]) (source) (Issue #6605)

Hello Al,

I think a zoom meeting will be go to touch base again. Tomorrow I will be able to meet in the afternoon 2-4pm

Yes I think is a good idea to tackle one at the time. Please note that I have created the record on GitHub while stepping on those so we can trace them.

All the best, Carlos

On May 4, 2022, at 10:10 AM, Albert Rossi @.***> wrote:

@cfgamboa https://github.com/cfgamboahttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cfgamboa-253E&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=60rQ0HHqHmEY1P6VSdyuTQ&m=hBJcH0FuSEuhI0cx0i08R_9KLlD1LSMquKacwRsPqyuSaxkxw0Mlnp9GiDxkwlsU&s=eFAjbPsFxCGX_V5U9s-k7K51CgKs4L6kv-7vszAnHi0&e= @DmitryLitvintsev https://github.com/DmitryLitvintsevhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_DmitryLitvintsev-253E&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=60rQ0HHqHmEY1P6VSdyuTQ&m=hBJcH0FuSEuhI0cx0i08R_9KLlD1LSMquKacwRsPqyuSaxkxw0Mlnp9GiDxkwlsU&s=AzIvipiOFT7wI2ZNuDVwq7WUy93r7rDGrqbRq5a6nt0&e= Carlos,

I do believe this is related to both #6658 and #6602 https://github.com/dCache/dcache/issues/6602https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6602-253E&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=60rQ0HHqHmEY1P6VSdyuTQ&m=hBJcH0FuSEuhI0cx0i08R_9KLlD1LSMquKacwRsPqyuSaxkxw0Mlnp9GiDxkwlsU&s=ObbMN8OD7eWcPTTC8x1_uVclvp8JMI8RksnMM59kshg&e=.

I suggest we tackle one issue at a time. I would suggest starting with getting #6658 to work as we suggested.

If you need us to zoom again with you, please let us know.

Thanks, Al

— Reply to this email directly, view it on GitHub https://github.com/dCache/dcache/issues/6605#issuecomment-1117361125https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6605-23issuecomment-2D1117361125-253E&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=60rQ0HHqHmEY1P6VSdyuTQ&m=hBJcH0FuSEuhI0cx0i08R_9KLlD1LSMquKacwRsPqyuSaxkxw0Mlnp9GiDxkwlsU&s=SikjYHSQ_j9XvIHRJX-4CaLCkJlB6tg5Pi8cgojNeEQ&e=, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHIHMO6HUHDWGOPN55LD7XDVIKAMJANCNFSM5UDRGFAQhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AHIHMO6HUHDWGOPN55LD7XDVIKAMJANCNFSM5UDRGFAQ-253E&d=DwQFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=60rQ0HHqHmEY1P6VSdyuTQ&m=hBJcH0FuSEuhI0cx0i08R_9KLlD1LSMquKacwRsPqyuSaxkxw0Mlnp9GiDxkwlsU&s=21li8YnpuUvUjX-EBkugTh64vzbkee-zqOWhutRYYkE&e=. You are receiving this because you were mentioned.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dCache_dcache_issues_6605-23issuecomment-2D1117975597&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=60rQ0HHqHmEY1P6VSdyuTQ&m=hBJcH0FuSEuhI0cx0i08R_9KLlD1LSMquKacwRsPqyuSaxkxw0Mlnp9GiDxkwlsU&s=kq8N0EhJK9yH8lJY6bat1TQEVAX0_bdpy-qYrLlGDCs&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AA6NBHD4NTXVBN3ZIRSILPTVILXULANCNFSM5UDRGFAQ&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=60rQ0HHqHmEY1P6VSdyuTQ&m=hBJcH0FuSEuhI0cx0i08R_9KLlD1LSMquKacwRsPqyuSaxkxw0Mlnp9GiDxkwlsU&s=HHwl2clWR9baF8l8PVt6jmCd-_uGgnpvo-GGq7UOPt4&e=. You are receiving this because you commented.Message ID: @.***>