cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.32k forks source link

fix retry logic for triton client #46703

Closed cjh1 closed 1 day ago

cjh1 commented 1 week ago

PR description:

On retry the client was trying to access the TritonService through the ServiceRegistry. However, the thread calling the evalute method did not have the appropriate context setup to allow this. We now save the ServiceToken when the client is created, so the appropriate context can be setup before accessing the service.

PR validation:

We tested this at NERSC where we are seeing GOAWAY responses from our nginx ingress. This logic successfully retries the failed requests.

cjh1 commented 1 week ago

@asnaylor

cmsbuild commented 1 week ago

cms-bot internal usage

asnaylor commented 1 week ago

@kpedro88 I've tested @cjh1 patch at NERSC and its working fine. This will fix the issues we were having connecting to the TritonServer running at NERSC.

cmsbuild commented 1 week ago

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46703/42656

cmsbuild commented 1 week ago

A new Pull Request was created by @cjh1 for master.

It involves the following packages:

@cmsbuild, @fwyzard, @makortel can you please review it and eventually sign? Thanks. @kpedro88, @makortel, @missirol, @riga, @rovere this is something you requested to watch as well. @antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

kpedro88 commented 6 days ago

@asnaylor @cjh1 thanks for this very useful contribution!

kpedro88 commented 6 days ago

test parameters: workflows = 10805.31,11634.9001,24834.9001 relvals_opt = --what cleanedupgrade,standard,highstats,pileup,generator,extendedgen,production,identity,ged,machine,premix,nano,gpu,2017,2026

kpedro88 commented 6 days ago

please test

cmsbuild commented 6 days ago

-1

Failed Tests: RelVals Size: This PR adds an extra 12KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ef1d15/42890/summary.html COMMIT: b8e88f2dbecc68cdf19f6eaa29e3b5020adce8ab CMSSW: CMSSW_14_2_X_2024-11-15-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46703/42890/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

ERROR importing file  relval_data_highstats name 'base_wf_number_2022' is not defined
kpedro88 commented 6 days ago

please test with #46701

cmsbuild commented 6 days ago

+1

Size: This PR adds an extra 12KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ef1d15/42895/summary.html COMMIT: b8e88f2dbecc68cdf19f6eaa29e3b5020adce8ab CMSSW: CMSSW_14_2_X_2024-11-15-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46703/42895/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

cmsbuild commented 1 day ago

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-46703/42719

cmsbuild commented 1 day ago

Pull request #46703 was updated. @cmsbuild, @fwyzard, @makortel can you please check and sign again.

makortel commented 1 day ago

@cmsbuild, please test

cmsbuild commented 1 day ago

+1

Size: This PR adds an extra 24KB to repository Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ef1d15/42977/summary.html COMMIT: 71301131a636c6b0af1bedd3328f4923aeed0bb5 CMSSW: CMSSW_14_2_X_2024-11-20-1100/el8_amd64_gcc12 User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46703/42977/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

makortel commented 1 day ago

Comparison differences are related to https://github.com/cms-sw/cmssw/issues/46416

makortel commented 1 day ago

+heterogeneous

cmsbuild commented 1 day ago

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @mandrenguyen, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

mandrenguyen commented 1 day ago

+1