PelicanPlatform / pelican

The Pelican Platform for creating data federations
https://pelicanplatform.org/
Apache License 2.0
11 stars 26 forks source link

Have plugin include K8SPhysicalHostName in the hold message #1480

Open bbockelm opened 4 months ago

bbockelm commented 4 months ago

At some hosts, the "EP name" is meaningless (it is the randomly-generated pod name) but the machine's K8SPhysicalHostName attribute records the "real" hostname.

We should add this name to the metadata we include in the hold message for the plugin. It should be recorded, if present, as the hostname attribute); if K8SPhysicalHostName is not present, then no hostname should be put in the error message (as it would be duplicative of other parts of the error message.

So, example:

Attempt #1: from osg-kansas-city-stashcache.nrp.internet2.edu:8443: transfer error: \
   Unable to read /path-facility/data/foo; network dropped connection on reset (5m47.8s since start) \
   (Version: 7.9.2; Site: UNL-PATH)

would become:

Attempt #1: from osg-kansas-city-stashcache.nrp.internet2.edu:8443: transfer error: \
   Unable to read /path-facility/data/foo; network dropped connection on reset (5m47.8s since start) \
   (Version: 7.9.2; Site: UNL-PATH; Hostname: foo.unl.edu)
turetske commented 3 months ago

@bbockelm I'm not finding that attribute in the condor documentation at all for MachineAd. Is it custom or would it be under Machine or ClientMachine? Or am I looking in the wrong place entirely? If it is custom, would it be in the machine ad or the class ad? My instinct is the machine ad, but I want to be sure.

bbockelm commented 3 months ago

You want to look for the file via the _CONDOR_MACHINE_AD environment variable. See this documentation section: https://htcondor.readthedocs.io/en/latest/users-manual/env-of-job.html#extra-environment-variables-htcondor-sets-for-jobs