Open nikmagini opened 7 years ago
The prioirity calculation and mapping seems to be working correctly. E.g. this job has priority 4 in ftsjob.dmp
'PRIORITY' => 4, 'ID' => 'd72b49ee-ab31-11e6-bc94-02163e018122',
But then the agent executes the fts-set-priority command with priority 1:
POE::Component::Child - run(): "fts-set-priority -s https://fts3.cern.ch:8446 d72b49ee-ab31-11e6-bc94-02163e018122 1", wheel=63066, pid=22049
So there must be something wrong in passing the priority as argument to fts-set-priority
Found the bug, which has always been there since the beginning...
When PHEDEX::Transfer::Backend::Monitor::QueueJob is called, the priority is hardcoded to 1:
https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/FTS3.pm#L562
https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Monitor.pm#L588
This has nothing to do with FTS transfer priority - it is the priority of the monitoring (fts-transfer-status) command in the agent POE queue...
But then the FTS job priority is set to the same value:
https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Monitor.pm#L597
This is a leftover of unused code: the POE job priority was supposed to be adjusted on the fly to poll certain jobs more frequently in the Monitor.
However, the same priority value (set to 1) is later also used by the FTS/FTS3 backends to set the priority of the job in FTS:
https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/FTS3.pm#L569 https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Interface/FTS3CLIAsync.pm#L186
Possible fix: it should be enough to remove this line... https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Monitor.pm#L597
Was looking into this, and Iarrived to same conclusion. QueueJob set the priority to 1, because it expect 2 arguments and only one is given, is it?. $ftsjob enters as one single argument. But I just said, apparently is not needed to re-set the priority of the job. So, either we fix all calls to QueueJob, only correct in perl_lib/PHEDEX/Transfer/Backend/Manager.pm, or change the way the arguments are used.
Hi Alberto,
your commit should fix the issue of job priorities in FTS, so I'm going to merge it.
I'm still not fully satisfied by the fact that the same $ftsjob->PRIORITY variable is used to set both FTS priority and POE priority. Since FTS priorities are ascending (1 is min, 5 is max) while POE priorities are descending (0 is max), this has a curious side-effect: POE will execute submit and poll commands for low-priority FTS jobs before high-priority FTS jobs...
A proper implementation would use a different variable (e.g. $ftsjob->TRANSFERPRIORITY) for the FTS priorities.
Anyway this is just for the submission and the first poll of the job, which should complete quickly; after that point, all jobs are polled with the same POE priority (30):
https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/Backend/Monitor.pm#L281
And the important thing is that transfers will now be executed in the correct priority order in FTS.
So this is good enough; merging.
I am including this commit in the next agents release PHEDEX_4_2_2. However leave the issue open until a proper solution is implemented.
Looking at FTS monitoring, it seems that all jobs submitted by PhEDEx are set to priority 1 (lowest in FTS):
https://fts3.cern.ch:8449/fts3/ftsmon/#/jobs?vo=cms&source_se=&dest_se=&time_window=1
But PhEDEx is supposed to use the FTS priority range 2-4 depending on the task priority in TMDB:
https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Transfer/FTS3.pm#L158-L166
In particular, PhEDEx transfers get lower priority than ASO transfers which are submitted at the default FTS priority (3)
This seems to affect both the old FTS backend and the new FTS3 backend. Maybe a bug in the default priority mapping?
Setting issue to low priority though: this code hasn't changed since the initial release of the FTS backend in PHEDEX_3, so it's probably always been broken since 2008, and nobody has noticed yet...