Open CannonLock opened 1 month ago
Adding this to Justin's plate for 7.12.
A few notes:
metrics
subdirectory).So, if we are missing write data, it's likely breaking in either step (1) or step (3). I'd suggest trying to bisect the issue - is it an xrootd problem or a Pelican problem? - by adding a few judicious logging statements to the packet processor and uploading data to a development origin.
My findings are the following:
pelican object get
I can see that the read
label updates properlypelican object put
to write some objects to the origin. The write
label did not update and remained at zero.xrootd_transfer_operations_count
updates reads properly but not writes.So now I am going to investigate why exactly we are missing the writes but not the reads.
I think I may have found the bug. I used xrdcp
to copy a file to the pelican xrootd server. This correctly trigged a write and was updated correctly in both xrootd_transfer_operations_count
and xrootd_transfer_bytes
. I also tested reads and that worked correctly as well. This leads me to believe that there may be an issue in the XrdHttp
protocol implementation.
None of the currently running origins are reporting
write
forxrootd_transfer_bytes
. This can be seen from the director metrics that are being reported with prometheus and it can be seen in the Elasticsearch logs shown below.Noting in the reports below it looks like none of the Origins that I am familiar with are reporting write, and many are missing entirely from the reports themselves.
https://github.com/CannonLock/ES_queries/blob/master/osdf.ipynb
I have not confirmed that the issue is not in the aggregation layer. I don't have access to the reporting origins I know are writing to check if they know they are writing.
Assigning to Brian as it is not clear to me where this problem lies.