cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

Timeout Xrootd requests at the CMSSW-level #18440

Open bbockelm opened 7 years ago

bbockelm commented 7 years ago

We currently rely on Xrootd's timeout mechanism -- issues with the current release (xrootd 4.5.0) have shown that this isn't 100% reliable.

We don't have our own mechanism because the Xrootd callbacks aren't cancellable -- they might fire at any time after we've given up, even after we've destroyed the corresponding object.

There were two ideas bounced around:

cmsbuild commented 7 years ago

A new Issue was created by @bbockelm Brian Bockelman.

@davidlange6, @Dr15Jones, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

davidlange6 commented 7 years ago

assign core

cmsbuild commented 7 years ago

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

Dr15Jones commented 7 years ago

@bbockelm I'm confused, if the RequestManager can be leaked then it is on the heap. If it is on the heap then it could be held by a std::shared_ptr<RequestManager>. If it is held by a std::shared_ptr<> then one can create as many std::weak_ptr<> as we want and hand those to the callback specific object.

What am I missing?

bbockelm commented 7 years ago

@Dr15Jones - yup, that's almost exactly the second option outlined. Currently, there is no callback-specific object passed around (instead, the callback is given a pointer to the long-lived RequestManager), meaning we'd have to create a callback-specific object on the heap for each IO request to hold the std::weak_ptr<>.

I think this is acceptable since there are other places that allocate per-request, so we're already paying the cost of per-IO heap allocations.

Dr15Jones commented 7 years ago

@bbockelm I'm still confused. If the callback is given a pointer to the 'long-lived' RequestManager, I don't see why it couldn't instead just be given a std::weak_ptr<> to the same RequestManager.

smuzaffar commented 10 months ago

is this still an open issue?