dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
276 stars 133 forks source link

HTTP-TPC transfer monitoring - stripe source and destination perf marker #7058

Open vokac opened 1 year ago

vokac commented 1 year ago

To improve HTTP-TPC transfer monitoring and for better transfer issue diagnosis WLCG BDT group proposed HTTP-TPC update which adds new Perf Marker entries that provide real transfer connection source (Stripe Source) and destination (Stripe Source) addresses. The address format is same as defined for RemoteConnections and it is composed of transport protocol, address and port separated by :, e.g. tcp:192.0.2.100:1234 or tcp:[2001:db8::100]:1234.

Example of whole performance marker with proposed extension:

Perf Marker\n
Timestamp: 1537788010\n
Stripe Index: 0\n
Stripe Bytes Transferred: 238745\n
Total Stripe Count: 1\n
RemoteConnections: tcp:147.231.25.166:21234,tcp:[2001:718:401:6017:2::28]:24081\n
Stripe Source: tcp:[2001:718:401:6017:2::28]:24081\n
Stripe Destination: tcp:[2001:1458:301:105::100:5]:8443\n
End\n
paulmillar commented 1 year ago

Looks reasonable to me.

Have you checked that this doesn't break existing FTS deployments? Will they simply ignore these unexpected, new fields?

vokac commented 1 year ago

gfal/davix ignore unknown lines in performance markers.

In the XRootD ticket https://github.com/xrootd/xrootd/issues/1963 @bbockelm mentioned that Stripe prefix doesn't make too much sense (GridFTP specific) and should be dropped from HTTP transfer connection source/destination marker.

paulmillar commented 1 year ago

Thanks for checking gfal/davix's behaviour

I've no strong opinion on the suggestion from @bbockelm to drop Stripe prefix, other than:

vokac commented 1 year ago

I would like to avoid huge project of replacing existing perfmarkers and focus just on new Source and Destination. It would be still possible to update value format in future (there are not many consumers of this information - we could state that the unknown format of these two new markers should be ignored by TPC client) in case we have to cover non-HTTP transport protocol.

Do we agree on this update in my original proposal

bbockelm commented 1 year ago

Do we agree on this update in my original proposal

90% there. How about:

?

At least for XRootD, the current implementation can have multiple sources, hence the plural. I can't think of a reason why you'd have multiple destinations ... but feels more future-proof if we make that plural as well.

vokac commented 1 year ago

TCP connection with several different sources? I thought that TCP have just one source and one destination address by design. Are you talking about more fancy protocols like SCTP? As I mentioned we could in the future extend value format for these exotic protocols, e.g. sctp:[list of addresses].

If we ever decide to implement multistream for HTTP-TPC then it is still one source and one destination address for each individual TCP connection. Aggregating all addresses in one Sources perf marker would hide details if we try to find & troubleshoot problems with individual TCP connection (source/destination) and that would make this whole proposal a bit less useful.

bbockelm commented 1 year ago

TCP connection with several different sources?

No - several different TCP connections for a single transfer. This already exists for XRootD…

paulmillar commented 1 year ago

A couple of comments:

First, source(s) and destination(s) might be the wrong abstraction.

If we're imagining the possibility of multiple TCP connections (a feature which xrootd already supports) then separating the sources and destinations might lead to ambiguity. Recall that a TCP connection is defined as the quad-tuple: source-IP-address, source-port, destination-IP-address, destination-port. If there are multiple entries in Sources and Destinations, what are the corresponding connections?

An alternative might be to define a list of connections (e.g., a Connections attribute), which is a list where entry is the TCP quad tuple for that connection. This removes the ambiguity. For the sake of completeness, here's an example:

Connections: tcp:[2001:718:401:6017:2::28]:24081:[2001:1458:301:105::100:5]:8443\n

Alternatively, we could just have an ordering convention: the first entry in Sources attribute corresponds to the first entry in Destinations attribute. This would work. However, (to me) it's less appealing as it's somewhat more fragile; for example, what should the client do if the active party returns a Sources and Destinations attributes with lists of different lengths?

(as an aside, you may notice how using a Connections attribute is sort of re-inventing the strips concept that already exists in the progress report format. Just sayin' ;-) )

Second, this whole approach is (perhaps) "broken".

This approach and the existing RemoteConnections attribute suffer from the same assumption (and corresponding limitation): that there's at least one progress report, within which the connection is active. To give a counter-example, a transfer might attempt IPv6 that quickly fails and the active party then tries an IPv4 address that works. The failed IPv6 connection attempt is significant but might not be included in any progress report. From the progress report information alone, it would not be possible to learn, for an IPv4-based transfer, whether or not the active party attempted an IPv6 connection and against which IPv6 address.

An alternative approach might be to consider the progress report and the final response being opportunities for the active party to report "connection events". Just as an illustrative example, we could define three classes of connection event: the success in establishing a TCP connection, the failure in establishing a TCP connection, the closing of an established TCP connection. Each event includes the TCP quad-tuple and the time the event happened. Note that other choices of event classes are possible, this is only a quick example.

Like this, connection events are reported to the client (FTS) "at some point". Ideally, these would be in the next progress report with the final message providing an opportunity to report all currently unreported events. This would avoid loosing information because activity happened "too quickly".

That all said, at the moment, I don't see a problem in reporting the information as requested, but perhaps some more consultation is needed to make sure all implementations provide the information in the same format.

bbockelm commented 1 year ago

I think you’re proposing something quite useful but also a bit beyond the current project.

I think we could keep the existing structure (with the tweaks proposed for the naming) and then maybe open up a separate ticket for per-connection monitoring?

vokac commented 1 year ago

btw: for the crappy transfer throughput for individual connection I would like to see details about

Timestamp: 1537788010\n
Stripe Bytes Transferred: 238745\n
Stripe Source: tcp:[2001:718:401:6017:2::28]:24081\n
Stripe Destination: tcp:[2001:1458:301:105::100:5]:8443\n

This was the reason why I proposed prefix Stripe also for source and destination address.

paulmillar commented 1 year ago

@bbockelm, Indeed, the "connection events" idea is broader and something that would take more time. My comment was more to get the idea written down; it isn't something I would advocate as a short-term goal.

@vokac , I think Stripe Source and Stripe Destination makes sense, while we have the concept of strips in the progress reports.

@bbockelm do you happen to know if xrootd reports multiple strips (in the progress reports) if the transfer involves multiple connections?

bbockelm commented 1 year ago

xrootd reports multiple strips (in the progress reports) if the transfer involves multiple connections?

It does not. There is no concept of stripe here so it’s reported as multiple connections in a single stripe.

paulmillar commented 1 year ago

However, I guess each TCP connection transfers a different subset of the file's bytes, right: there's no duplicates.

So, isn't this the same having different stripes?

bbockelm commented 1 year ago

So, isn't this the same having different stripes?

Right - I think of stripes as in https://en.wikipedia.org/wiki/Data_striping; that is, there's some fixed scheme mapping fixed-sized blocks between the channels. The GridFTP performance marker terminology inherited from this approach (see https://www.globus.org/sites/default/files/gridftp_final.pdf) and, IIRC, strongly mimics RAID.

In the XRootD implementation, there's effectively a FIFO of data chunks (fixed-size currently -- but no guarantee of that in the future) to move. The different TCP connections pull from the FIFO as individual chunks complete.

paulmillar commented 1 year ago

Is the XRootD approach really so different from the GridFTP striping?

The main difference seems to be that GridFTP assigned chunks to TCP connections ahead of the transfer, while XRootD assigned chunks to TCP connections at the point where a TCP connection can support sending more data. This seems to be a kind of "early binding" vs "late binding" distinction, with the same underlying idea.

(Not that this really matters, Just sharing some ideas.)

vokac commented 1 year ago

Updated proposal came during today's WLCG BDT meeting with suggestion to deprecate old "Stripe" markers.