dmwm / CRABServer

15 stars 38 forks source link

make sure to use bulk queries whenever possible #7002

Open belforte opened 2 years ago

belforte commented 2 years ago

following screenshot illustrated the amount of HTTP queries to crabserver during peak times. While there are spikes from Publisher as well, bulk is from schedd scripts (PostJobs and/or FTS_transfers). That's too much to feel comfortable. We can't reduce number of SQL transactions unless we somehow redesign, but should at least make sure tht bulk API's are used as muhc as possible. Screenshot from 2022-01-24 14-01-56

URL for that dashboard: https://monit-grafana.cern.ch/d/qUVV6S0Gk/crab-timber-pods?orgId=11&from=now-3d&to=now&var-system=crabserver&var-method=All&var-api=All&var-code=All&var-metadataType=All&var-Filters=data.code%7C%3E%7C1&var-Filters=data.api%7C!%3D%7Cinfo&var-copy_of_system=All&var-cluster=cmsweb&var-env=k8s-prod&var-client=CRABSchedd%2Fv3.220107&var-client=CRABPublisher%2Fv3.211111&var-bin=10m&var-dnFilter=.*

filtering a bit I find that:

belforte commented 2 years ago

related to https://github.com/dmwm/CRABServer/issues/6097 ?

belforte commented 2 years ago

I suspect that this pice of code is not good. Both mark_transferred and mark_failed support lists as arguments, as does the call to REST, but here they are called once for every job. These contribute to POST for filetransfers API (that's the omly REST API used by FTS_Transfers.py) https://github.com/dmwm/CRABServer/blob/b642c5c65ba0c5596f4303edca46118627ed6e16/scripts/task_process/FTS_Transfers.py#L531-L548

belforte commented 2 years ago

fileusertransfer and filemetadata API which are the other large source of calls, only from schedd (see above comments) are use by PostJob.

The GET to filemetadata is from Publisher/PublisherMaster, but is not much and anyhow it is only one call per task, in a way even too much bulk, as the output Json can be hundreds of MB's !

belforte commented 2 years ago

looking at calls to fileusertransfers (of course calls from PostJobs are always for one job at a time, can't make bulk).

I think that PUT is one per file when a job completes, POST is when transfer status is updated, all the GET are less clear, maybe a relic of CoucheDB code where document had to be read/modified/written-back. We should be able reduce the number of GET, it is quite odd that we have more GET than POST ! Then there are about 5x more PUT than POST, odd !

GET calls to fileusertransfers are here https://github.com/dmwm/CRABServer/blob/82e032dcebd3f68b4f971d8349dd2b722b92e3cb/src/python/TaskWorker/Actions/PostJob.py#L785-L786 https://github.com/dmwm/CRABServer/blob/82e032dcebd3f68b4f971d8349dd2b722b92e3cb/src/python/TaskWorker/Actions/PostJob.py#L994-L999

POST calls are here https://github.com/dmwm/CRABServer/blob/82e032dcebd3f68b4f971d8349dd2b722b92e3cb/src/python/TaskWorker/Actions/PostJob.py#L868 https://github.com/dmwm/CRABServer/blob/82e032dcebd3f68b4f971d8349dd2b722b92e3cb/src/python/TaskWorker/Actions/PostJob.py#L1128 https://github.com/dmwm/CRABServer/blob/82e032dcebd3f68b4f971d8349dd2b722b92e3cb/src/python/TaskWorker/Actions/PostJob.py#L1162

And there is a single place where PUT is called https://github.com/dmwm/CRABServer/blob/82e032dcebd3f68b4f971d8349dd2b722b92e3cb/src/python/TaskWorker/Actions/PostJob.py#L828

mapellidario commented 1 year ago

@belforte During a private meeting you mentioned that we also have a line in publisher where we may want to do this. Could you add it here? thanks! :)

belforte commented 1 year ago

the need to change Publisher is described in https://github.com/dmwm/CRABServer/issues/6097