PelicanPlatform / pelican

The Pelican Platform for creating data federations
https://pelicanplatform.org/
Apache License 2.0
10 stars 24 forks source link

`writes` not shown in origin records #1630

Open CannonLock opened 1 week ago

CannonLock commented 1 week ago

None of the currently running origins are reporting write for xrootd_transfer_bytes. This can be seen from the director metrics that are being reported with prometheus and it can be seen in the Elasticsearch logs shown below.

Noting in the reports below it looks like none of the Origins that I am familiar with are reporting write, and many are missing entirely from the reports themselves.

https://github.com/CannonLock/ES_queries/blob/master/osdf.ipynb

I have not confirmed that the issue is not in the aggregation layer. I don't have access to the reporting origins I know are writing to check if they know they are writing.

Assigning to Brian as it is not clear to me where this problem lies.

bbockelm commented 1 week ago

Adding this to Justin's plate for 7.12.

A few notes:

  1. XRootD creates a packet for each transfer; the format of the packet is here.
  2. Sometime after the transfer completes (we usually have this set to O(10s), the packet is sent over UDP to the Pelican process.
  3. Pelican parses the packet and adds it to a Prometheus counter (see the metrics subdirectory).

So, if we are missing write data, it's likely breaking in either step (1) or step (3). I'd suggest trying to bisect the issue - is it an xrootd problem or a Pelican problem? - by adding a few judicious logging statements to the packet processor and uploading data to a development origin.