dmwm / PHEDEX

CMS data-placement suite
8 stars 18 forks source link

FTS/FTS3 backends: all transfers submitted with priority 1 (lowest) #1068

Open nikmagini opened 7 years ago

nikmagini commented 7 years ago

Looking at FTS monitoring, it seems that all jobs submitted by PhEDEx are set to priority 1 (lowest in FTS):

https://fts3.cern.ch:8449/fts3/ftsmon/#/jobs?vo=cms&source_se=&dest_se=&time_window=1

But PhEDEx is supposed to use the FTS priority range 2-4 depending on the task priority in TMDB:

https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/FTS3.pm#L158-L166

In particular, PhEDEx transfers get lower priority than ASO transfers which are submitted at the default FTS priority (3)

This seems to affect both the old FTS backend and the new FTS3 backend. Maybe a bug in the default priority mapping?

Setting issue to low priority though: this code hasn't changed since the initial release of the FTS backend in PHEDEX_3, so it's probably always been broken since 2008, and nobody has noticed yet...

nikmagini commented 7 years ago

The prioirity calculation and mapping seems to be working correctly. E.g. this job has priority 4 in ftsjob.dmp

'PRIORITY' => 4, 'ID' => 'd72b49ee-ab31-11e6-bc94-02163e018122',

But then the agent executes the fts-set-priority command with priority 1:

POE::Component::Child - run(): "fts-set-priority -s https://fts3.cern.ch:8446 d72b49ee-ab31-11e6-bc94-02163e018122 1", wheel=63066, pid=22049

So there must be something wrong in passing the priority as argument to fts-set-priority

nikmagini commented 7 years ago

Found the bug, which has always been there since the beginning...

When PHEDEX::Transfer::Backend::Monitor::QueueJob is called, the priority is hardcoded to 1:

https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/FTS3.pm#L562

https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Monitor.pm#L588

This has nothing to do with FTS transfer priority - it is the priority of the monitoring (fts-transfer-status) command in the agent POE queue...

But then the FTS job priority is set to the same value:

https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Monitor.pm#L597

This is a leftover of unused code: the POE job priority was supposed to be adjusted on the fly to poll certain jobs more frequently in the Monitor.

However, the same priority value (set to 1) is later also used by the FTS/FTS3 backends to set the priority of the job in FTS:

https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/FTS3.pm#L569 https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Interface/FTS3CLIAsync.pm#L186

Possible fix: it should be enough to remove this line... https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Monitor.pm#L597

alberto-sanchez commented 7 years ago

Was looking into this, and Iarrived to same conclusion. QueueJob set the priority to 1, because it expect 2 arguments and only one is given, is it?. $ftsjob enters as one single argument. But I just said, apparently is not needed to re-set the priority of the job. So, either we fix all calls to QueueJob, only correct in perl_lib/PHEDEX/Transfer/Backend/Manager.pm, or change the way the arguments are used.

nikmagini commented 7 years ago

Hi Alberto,

your commit should fix the issue of job priorities in FTS, so I'm going to merge it.

I'm still not fully satisfied by the fact that the same $ftsjob->PRIORITY variable is used to set both FTS priority and POE priority. Since FTS priorities are ascending (1 is min, 5 is max) while POE priorities are descending (0 is max), this has a curious side-effect: POE will execute submit and poll commands for low-priority FTS jobs before high-priority FTS jobs...

A proper implementation would use a different variable (e.g. $ftsjob->TRANSFERPRIORITY) for the FTS priorities.

Anyway this is just for the submission and the first poll of the job, which should complete quickly; after that point, all jobs are polled with the same POE priority (30):

https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Monitor.pm#L281

And the important thing is that transfers will now be executed in the correct priority order in FTS.

So this is good enough; merging.

nataliaratnikova commented 6 years ago

I am including this commit in the next agents release PHEDEX_4_2_2. However leave the issue open until a proper solution is implemented.