dmwm / AsyncStageout

6 stars 10 forks source link

Split transfers over FTS servers by country #4436

Closed HassenRiahi closed 7 years ago

HassenRiahi commented 9 years ago

Details are here: https://hypernews.cern.ch/HyperNews/CMS/get/crabDevelopment/2423.html

PerilousApricot commented 9 years ago

Hi Hassen,

The ftsmap stuff you talk about in the ticket is obsoleted. Now we (we=site) just dump all of the transfers into a single FTS instance. For instance, I chose a random site in SITECONF:

https://git.cern.ch/web/siteconf.git/blob/HEAD:/T2_DE_DESY/PhEDEx/ConfigPart.FTSDownload

TIER1_FTS_SERVER=https://lcgfts3.gridpp.rl.ac.uk:8443 
### AGENT LABEL=download-fts-all PROGRAM=Toolkit/Transfer/FileDownload ENVIRON=glite
  -db              ${PHEDEX_DBPARAM}
  -nodes           ${PHEDEX_NODE}
  # -ignore          'T1_CERN_MSS','T2_CH_CAF'
  -accept          '%'
  -delete          ${PHEDEX_CONFIG}/FileDownloadDelete
  -validate        ${PHEDEX_CONFIG}/FileDownloadSRMVerify
  -backend         FTS
  # -mapfile         ${PHEDEX_CONFIG}/fts.map
  -service         ${TIER1_FTS_SERVER}
  # -ftspass         ${PHEDEX_CONFIG}/ftspass
  -protocols       'srmv2','srm'
  -batch-files           5
  -max-active-files      200
  -link-pending-files    5

But in either case, I think the idea is good. A quick glance says that it might be possible to re-use this:

https://github.com/dmwm/AsyncStageout/blob/a2bafc687c73745553c1e5d26e9fed9fe3b76ee3/src/couchapp/config/_docs/FNAL.json

?

PerilousApricot commented 9 years ago

Of course, the rad (but likely impossible) way to do it would be to have the FTS3 instances coordinate with each other somehow, so they could see what was happening globally.

HassenRiahi commented 9 years ago

I think the mapping is coordinated by the transfer team and the change has been announced 1 year ago here https://hypernews.cern.ch/HyperNews/CMS/get/comp-ops/1682.html:

where

CERN: https://fts3.cern.ch:8443 RAL: https://lcgfts3.gridpp.rl.ac.uk:8443 FNAL: https://cmsfts3.fnal.gov:8443

HassenRiahi commented 9 years ago

I know that a coordinator between FTS servers is in the todo list of FTS team. Let me check the timeline (if any).

PerilousApricot commented 9 years ago

Right, but the semantics is backwards now. The phedex agent that submits the transfer is pulling a file TO a site.

Previously, if T2_US_Vanderbilt wanted to pull a file from T1_DE_KIT, it would look up KIT in the ftsmap and submit the transfer to the RAL FTS server. It was the source site whose FTS server would get the request.

Now, with the new recommendations, T2_US_Vanderbilt always submits transfer requests to the FNAL FTS3 server. It's the destination site's FTS server that gets the request.

I agree that there's a mapping, but now each site just has one FTS3 server to submit to, instead of looking up the remote side's FTS2 host.

PerilousApricot commented 9 years ago


I'm not real sure why my English has gotten so bad recently....
HassenRiahi commented 9 years ago

But in either case, I think the idea is good. A quick glance says that it might be possible to re-use this:

https://github.com/dmwm/AsyncStageout/blob/a2bafc687c73745553c1e5d26e9fed9fe3b76ee3/src/couchapp/config/_docs/FNAL.json

Yes that doc was used before to map country to FTS servers as you guessed. But now it is broken because we introduced since that time the Monitor component of Phedex in ASO (https://github.com/dmwm/AsyncStageout/tree/master/src/Monitor). To enable it again it will require some implementation effort so that the transfer submitter can communicate the FTS endpoint to the Monitor

HassenRiahi commented 9 years ago

I'm not real sure why my English has gotten so bad recently....

me too :-)

PerilousApricot commented 9 years ago

Sounds great. I hope it doesn't end up being too difficult of a change. I'd offer to help, but I'm having a crisis trying to finish my thesis :(

PerilousApricot commented 9 years ago

I'm not real sure why my English has gotten so bad recently.... me too :-)

Your English is terrific :+1:

HassenRiahi commented 9 years ago

I hope it doesn't end up being too difficult of a change

For me it does not seem to be too hard to implement. Maybe @TonyWildish could confirm?

PerilousApricot commented 9 years ago

I guess an added optimization would be to set the FTS priority for ASO transfers to be the same as the FTS priority for "Normal" PhEDEx subscriptions. We (Vanderbilt) are stuck with a 10Gig link for possibly another ~18 months because of beaurocracy and we're pegging the link more and more often with a combination of PhEDEx/ASO/xrootd. Since most PhEDEx replications are low priority and CRAB3/ASO cares more about latency, having FTS3 prioritize ASO over PhEDEx would help keep ASO timeouts down.

HassenRiahi commented 9 years ago

I think it is possible to set it by destination and as site admin you should be able to do it. Let me check the correct syntax.

PerilousApricot commented 9 years ago

I think I can use the production role to run "fts-set-priority" on an individual job, but that means I need to constantly scan and update the job list to see new jobs to update. It probably makes more sense to set the priority in ASO when you submit the job (PhEDEx does the same)

HassenRiahi commented 9 years ago

Yup! but I am wondering if it is possible to use fts-config-set to set the share per SE between ASO/PhEDEx always using your production role.

PerilousApricot commented 9 years ago

Oh, that'd be rad. I didn't realize that FTS had the idea of different applications (at least it's not on the cli3 client). Perhaps there's something different exposed on the REST interface.

I should double-check I have the production role, I guess :)

HassenRiahi commented 9 years ago

Just discussed with FTS devel. and it seems the activity shares could be set only by VO in a given FTS endpoint and it is not possible to set it also per SE (as your case). So the options are:

1) Impelement the priority of jobs in ASO per SE at submission time as you pointed out; 2) Ask FTS team to include this as a feature request (activity shares management per VO and SE). It should not be hard to include.

belforte commented 7 years ago

obsolete, we always user CERN FTS now