ResourceStatus caching - Githubissues

ubeda commented 12 years ago

Moved here from pull 634

The ResourceStatus helper is ( indirecly ) used by the StorageElement, which makes it a heavily used component. Most of the times, it connects to the DB writing some logs. This was the symptom that started this discussion.

Here are some points discussed via different channels:

a) how the StorageElement is used must be reviewed. Maybe there is no need to instantiate it, as it has been done on the InputDataAgent, if only the Status is required. b) caching ResourceStatusClient / ResourceStatus c) making global ResourceStatusClient / ResourceStatus d) in the case of RSS this abnormal behavior was spotted immediately due to the amount of logs. I'm curious, we can also get information through services, which are little bit silent on this sense, making the spot of this kind of issues very hard. What would happen if we print a few lines every time a RPCClient is created ?

acasajus commented 12 years ago

Just answering to point d)

Turn on debug level and you will see how much RPCClients are created and the amount of stuff they do. But they were designed from the beginning to be as invisible as possible. The problem is not creating objects but what you do with them. The RSS objects seem way too heavy. As It's been said lots of times before. RSS objects should be just a frontend to a singleton that holds a cached state of the system. It has to behave like the CS. Get the state (or at least some state) and make it valid for 5-10 minutes at least.

graciani commented 12 years ago

I'm surprise that you now realize the check of status is heavy, you have only include SE, CEs and Sites will be a similar case.

with respect to: a) up to now the only way to check the status of a SE was to instantiate. One could have queried the CS directly, but there is bit more of logic there. If this is now directly provided by RSS OK, it should updated, but the same logic should be kept. Ie if Read is Enabled Check it is also, no matter what Checks says (this should be the case both using RSS or CS)

d)not sure what you want the servers to report. You have: http://lhcbweb.pic.es/DIRAC/LHCb-Production/visitor/systems/activitiesMonitoring/systemPlots?componentName=ResourceStatus/ResourceStatus

where in the last week plot you can see that RSS was getting few Hz of queries but not really a lot.

ubeda commented 12 years ago

Regarding a)

I've grep'ed 'ResourceStatusDB' on agents and services at LHCbDIRAC certification, and I have the following matches:

DMS
- RAWIntegrityManager
- DataUsage
- FileCatalog
- RAWIntegrity
- RAWIntegrityAgent
WMS
- MightyOptimizer
- InputDataAgent ( now is much better )
Transformation
- TransformationAgent
- TransformationCleaningAgent
Storage
- StageMonitorAgent

graciani commented 12 years ago

Resources are used basically everywhere I would expect whatever component that reports the status of the components, to be used almost everywhere in the code. thus why you need

a global object
a proper cache on the client
a proper cache on the server

atsareg commented 12 years ago

A general remark, please start such discussion in the developer forum. Issues are instruction for developers to work on, this is not a discussion forum.

As for the discussion. If you grep on gConfig you will get ways more hits. But this object does not create problems, because: a. it is a singleton b. it is caching the info

This is what Adri is suggesting you to do. This is waht we also have discussed briefly. In addition to what Ricardo says that in many cases the status of the resource can be checked without instantiation of the resource object like was the case with the StorageElement instantiation in the InputDataAgent.

The ResourceStatusClient indeed will be a rather heavily used object, so it should be properly done and used.

Finally, if you want to continue this discussion, please, move it to the developer forum http://groups.google.com/group/diracgrid-develop?hl=en . This is not that I want to cut this discussion, this is not the appropriate place

DIRACGrid / DIRAC

ResourceStatus caching #638