Open andrearendina opened 2 years ago
Hi Andrea,
Thanks for the report! I tried to replicate your issue by using two different EOS instances one running 4.8.82 version (used as source) and one running 5.0.23 (used as destination), but everything worked as expected. I used the following test script for this, basically relying on curl to trigger the TPC transfer:
export SRC=https://esdss000.cern.ch:9000/eos/dev/replica/xfile1.dat
export DST=https://elvin-dev01.cern.ch:9000/eos/dev/replica/copy_xfile.dat
export TSRC=$(curl -4 --silent --cert /tmp/x509up_u$(id -u) --key /tmp/x509up_u$(id -u) --cacert /tmp/x509up_u$(id -u) --capath /etc/grid-security/certificates -X POST -H 'Content-Type: application/macaroon-request' -d '{"caveats": ["activity:DOWNLOAD"], "validity": "PT3000M"}' "$SRC" | jq -r '.macaroon')
export TDST=$(curl -4 --silent --cert /tmp/x509up_u$(id -u) --key /tmp/x509up_u$(id -u) --cacert /tmp/x509up_u$(id -u) --capath /etc/grid-security/certificates -X POST -H 'Content-Type: application/macaroon-request' -d '{"caveats": ["activity:UPLOAD,DELETE,LIST"], "validity": "PT3000M"}' "$DST" | jq -r '.macaroon')
curl -4 -v --capath /etc/grid-security/certificates -L -X COPY -H 'Secure-Redirection: 1' -H 'X-No-Delegate: 1' -H 'Credentials: none' -H "Authorization: Bearer $TSRC" -H "TransferHeaderAuthorization: Bearer $TDST" -H "TransferHeaderTest: Test" -H "Destination: $DST" "$SRC"
In general EOS relies on the Content-Lenght
HTTP header to know how much data it should expect to receive, so it might be that this piece of information is not sent when the PUT is done by Storm?! This is just a wild guess and should be checked. I don't have a Storm instance to easily check this. Trying this between different EOS instances seems to yield the expected result.
[esindril@elvin-dev01]$ eos version
EOS_INSTANCE=eosdev
EOS_SERVER_VERSION=5.0.23 EOS_SERVER_RELEASE=1
EOS_CLIENT_VERSION=5.0.23 EOS_CLIENT_RELEASE=1
[esindril@elvin-dev01]$ eos ls -lrt /eos/dev/replica/
-rw-r--r-- 2 esindril c3 29210264 May 17 17:33 copy_xfile.dat
Cheers, Elvin
Hi Elvin,
thank you very much for the quick feedback! I have replicated your script to perform the push-copy in both cases: from WebDAV to EOS and from EOS to EOS in order to compare the logs inside the namespace server.
First of all, I noticed that the Content-Length parameter is in the header in both cases, respectively:
220519 08:25:45 467 http_Protocol: rc:51 got hdr line: PUT /eos/dockertest/andrea/test2M_webdav HTTP/1.1
220519 08:25:45 467 http_Protocol: Parsing first line: PUT /eos/dockertest/andrea/test2M_webdav HTTP/1.1
220519 08:25:45 467 http_Protocol: rc:479 got hdr line: Authorization: Bearer
[...]
220519 08:25:45 467 http_Protocol: rc:12 got hdr line: Test: Test
220519 08:25:45 467 http_Protocol: rc:25 got hdr line: Content-Length: 2097152
220519 08:25:45 467 http_Protocol: rc:40 got hdr line: Content-Type: application/octet-stream
220519 08:25:45 467 http_Protocol: rc:36 got hdr line: Host: eos-mgm.cr.cnaf.infn.it:9000
220519 08:25:45 467 http_Protocol: rc:24 got hdr line: Connection: Keep-Alive
220519 08:25:45 467 http_Protocol: rc:52 got hdr line: User-Agent: Apache-HttpClient/4.5.12 (Java/11.0.4)
220519 08:25:45 467 http_Protocol: rc:31 got hdr line: Accept-Encoding: gzip,deflate
220519 08:25:45 467 http_Protocol: rc:2 got hdr line:
220519 08:25:45 467 http_Protocol: rc:2 detected header end.
220519 08:12:41 446 http_Protocol: rc:44 got hdr line: PUT /eos/dockertest/andrea/test2M HTTP/1.1
220519 08:12:41 446 http_Protocol: Parsing first line: PUT /eos/dockertest/andrea/test2M HTTP/1.1
220519 08:12:41 446 http_Protocol: rc:35 got hdr line: User-Agent: xrootd-tpc/v5.4.3-rc1
220519 08:12:41 446 http_Protocol: rc:40 got hdr line: Host: eos-ingress.cr.cnaf.infn.it:9000
220519 08:12:41 446 http_Protocol: rc:13 got hdr line: Accept: */*
220519 08:12:41 446 http_Protocol: rc:470 got hdr line: Authorization: Bearer
[...]
220519 08:12:41 446 http_Protocol: rc:12 got hdr line: Test: Test
220519 08:12:41 446 http_Protocol: rc:25 got hdr line: Content-Length: 2097152
220519 08:12:41 446 http_Protocol: rc:22 got hdr line: Expect: 100-continue
220519 08:12:41 446 http_Protocol: rc:2 got hdr line:
220519 08:12:41 446 http_Protocol: rc:2 detected header end.
In the failure case the file has been created, but it is not "filled".
So it seems that the miscommunication between WebDAV and EOS involves another part of the transfer. I am also in contact with the StoRM developers to understand the reasons of this.
Please, keep open this issue. I will update it as soon as I have something new.
As further information, I attach the logs that I mentioned above to this comment. Obviously, I don't ask you to check them.
Thank you again for the great support, cheers, Andrea.
========================================================================
Hi Andrea,
Looking at the logs that you provided it seems that the WebDav client might not properly follow the redirection that it receives from the MGM daemon. Most likely you can easily replicate the behavior with a simple PUT operation as the TPC in the end boils down to this.
It would be interesting to know what the HTTP client from the other endpoint does/sees.
Cheers, Elvin
Dear Elvin,
I am waiting for a feedback of the StoRM developers. In the meanwhile, I increased the debug level of the WebDAV TPCs.
In the following lines I have found the PUT request that you mentioned in the last comment:
2022-05-19T14:34:09.075Z [] DEBUG 30403 --- [thread-pool-72] o.i.s.w.authz.VOMSAuthenticationFilter : Checking secure context token: null
2022-05-19T14:34:09.076Z [] DEBUG 30403 --- [thread-pool-72] o.i.s.w.authz.VOMSAuthenticationFilter : No client certificate found in request.
2022-05-19T14:34:09.076Z [] DEBUG 30403 --- [thread-pool-72] o.i.s.w.authz.VOMSAuthenticationFilter : No pre-authenticated principal found in request
2022-05-19T14:34:09.087Z [] DEBUG 30403 --- [thread-pool-72] o.i.s.webdav.server.DefaultPathResolver : /disk/test2M matches with access point /disk. Resolved storage area name: dteam
2022-05-19T14:34:09.088Z [] DEBUG 30403 --- [thread-pool-72] o.i.s.w.a.v.UnanimousDelegatedVoter : Voter: org.italiangrid.storm.webdav.authz.voters.FineGrainedAuthzVoter@5258a68, returned: 0
2022-05-19T14:34:09.088Z [] DEBUG 30403 --- [thread-pool-72] o.i.s.w.a.v.UnanimousDelegatedVoter : Voter: org.italiangrid.storm.webdav.authz.voters.FineGrainedCopyMoveAuthzVoter@777dfc3e, returned: 0
2022-05-19T14:34:09.097Z [] DEBUG 30403 --- [thread-pool-72] o.i.s.webdav.server.DefaultPathResolver : /disk/test2M matches with access point /disk. Resolved storage area name: dteam
2022-05-19T14:34:09.098Z [] DEBUG 30403 --- [thread-pool-72] o.i.s.w.a.v.UnanimousDelegatedVoter : Voter: org.italiangrid.storm.webdav.authz.voters.WlcgScopeAuthzVoter@74b6964, returned: 0
2022-05-19T14:34:09.099Z [] DEBUG 30403 --- [thread-pool-72] o.i.s.w.a.v.UnanimousDelegatedVoter : Voter: org.italiangrid.storm.webdav.authz.voters.WlcgScopeAuthzCopyMoveVoter@3813593b, returned: 0
2022-05-19T14:34:09.101Z [dd1be1d1-d35e-44b1-8c2d-8551ae515733] DEBUG 30403 --- [thread-pool-72] o.i.s.webdav.server.DefaultPathResolver : /disk/test2M matches with access point /disk. Resolved path: /storage/gemss_test1/dteam/disk/test2M
2022-05-19T14:34:09.101Z [dd1be1d1-d35e-44b1-8c2d-8551ae515733] INFO 30403 --- [thread-pool-72] o.i.storm.webdav.tpc.TransferFilter : Push third-party transfer requested: Source: /disk/test2M, Destination: https://eos-mgm.cr.cnaf.infn.it:9000/eos/dockertest/andrea/test2M, hasAuthorizationHeader: true, id: dd1be1d1-d35e-44b1-8c2d-8551ae515733
2022-05-19T14:34:09.101Z [dd1be1d1-d35e-44b1-8c2d-8551ae515733] DEBUG 30403 --- [thread-pool-72] o.i.storm.webdav.tpc.TransferFilter : PutTransferRequest [uuid=dd1be1d1-d35e-44b1-8c2d-8551ae515733, path=/disk/test2M, uri=https://eos-mgm.cr.cnaf.infn.it:9000/eos/dockertest/andrea/test2M, xferHeaders={Authorization=[Bearer [...]], Test=[Test]}, verifyChecksum=false, overwrite=true]
2022-05-19T14:34:09.102Z [dd1be1d1-d35e-44b1-8c2d-8551ae515733] DEBUG 30403 --- [thread-pool-72] o.i.s.webdav.server.DefaultPathResolver : /disk/test2M matches with access point /disk. Resolved path: /storage/gemss_test1/dteam/disk/test2M
2022-05-19T14:34:09.102Z [dd1be1d1-d35e-44b1-8c2d-8551ae515733] DEBUG 30403 --- [thread-pool-72] o.i.s.w.t.t.impl.TransferRequestImpl : TPC id: dd1be1d1-d35e-44b1-8c2d-8551ae515733, status: STARTED
2022-05-19T14:34:09.253Z [dd1be1d1-d35e-44b1-8c2d-8551ae515733] ERROR 30403 --- [thread-pool-72] o.i.s.w.tpc.http.HttpTransferClient : Connection reset by peer (Write failed)
I have also tried to increase the debug level of all the libraries and I noticed that WebDAV tried to pass all the file content (maybe through the header?) until the connection is closed and the pipe is broken. I attached this part below. I hope that it could help.
Thank you very much for your time and patiance, cheers, Andrea.
================================================================================
Hi Andrea,
I believe the problem is that the Apache HttpClient used by Storm to perform the PUT request does not send the Expect: 100-continue
header, therefore sending both the headers and the body in one go. According to the HttpClient documentation this needs to be set explicitly:
Sending everything in one go will not work with any EOS endpoint, since the initial point of contact is the namespace metadata server and only after one redirection does the client connect to the actual (data) server that will store/provide the data.
Cheers, Elvin
Hi Elvin,
sorry for the late reply. I have open also a ticket about this issue to the StoRM webDAV developers: https://issues.infn.it/jira/browse/STOR-1560.
As you can see from that, in the next release the request of sending the Expect: 100-continue
header will be configurable.
Then we can perform other tests and verify if this solves problem.
Until that moment, please leave open the issue.
Thank you again, cheers, Andrea
Hi all,
I have noticed a costant error in the third-party push-copies from StoRM WebDAV to EOS with file size bigger than 1MB.
In my tests, the version of the StoRM WebDAV server is
storm-webdav-1.4.1-1.el7.noarch
, while the EOS version is the5.0.22
and the XrootD version is the5.4.3
. Below, you can find two easy examples with the gfal tool of a successful transfer and a failed one.Checking the EOS logs, it seems there is a miscommunication between the file transfer services, in fact EOS always attempts to parse all the file as a header. So in the first example above all the file content has been written into the logs. Indeed, from the following configuration of EOS I suppose that this kind of behavoiur can only work until the size is lower than 1MB:
and this is the reason for which the transfers fail for the files with size bigger than 1MB. The failed transfer logs of EOS show the following error:
in which EOS tries to parse the first content line of the file, but then it fails.
As further information, by looking at the storm webdav logs, the third-party copy it is successfully submitted in both cases. In summary, it seems that in this specific case of the file transfer EOS is not able to distinguish the header from the content and it parses all the content as the header. As a consequence, the file transfer fails whenever the size is greater than 1MB.
Sorry if I made any mistakes or I was not clear and, please, feel free to move this issue if needed.