glideinWMS / glideinwms

The glideinWMS Project
http://tinyurl.com/glideinwms
Apache License 2.0
16 stars 45 forks source link

Request to isolate condor_stdout and condor_stderr as well as MasterLog and StartdLog from multi-node glideins #280

Open StevenCTimm opened 1 year ago

StevenCTimm commented 1 year ago

Is your feature request related to a problem? Please describe.

As currently configured, multi-node glideins such as HEPCloud runs at NERSC and TACC Frontera have the problem that all of their _condor_stdout and _condor_stderr get written on top of each other in a single file. This can make debugging difficult considering that we go up to as many as 100 nodes in a single glidein.

Describe the solution you'd like Would like _condor_stdout and _condor_stderr to be tagged individually by each node. In such a configuration condor transfer output would pull multiple such files back to the factory rather than having just one such file.

Describe alternatives you've considered This will be somewhat tricky as different multi-glide in setups are used on each host. NERSC SLURM setup executes a srun command on each of the 100 nodes. TACC FRONTERA by contrast starts on one node and uses the TACC in-built launching mechanism to run glidein_startup.sh on each of the 28 nodes in our jobs there.

Info (please complete the following information): Stakeholders and components can be a comma separated list or on multiple lines. If you add a new stakeholder or component, not on the sample list, add it on a line by its own.

Additional context Add any other context or supporting files about the feature request here.