Closed eflumerf closed 2 years ago
Comment by @jcfreeman2 on 2019-01-23 22:21:51
With commit 962ff8fba8d1e8553e69b72c963ec619f42d1fec on the develop branch, if we're running in direct process management mode, then when processes are launched the output on each host is saved in a file called
This issue has been migrated from https://cdcvs.fnal.gov/redmine/issues/21739 (FNAL account required) Originally created by @jcfreeman2 on 2019-01-22 23:04:31
Right now, in direct process management, if a process (or set of processes) doesn't launch on a node, it can be difficult to determine the cause. To avoid a verbose spew to stdout that would overwhelm everything else (especially if a large number of processes are desired) unless we're at the highest debug level (currently 4) the output of the source of the setup script and the launch of the artdaq processes is suppressed. The downside of this is that if the source of the setup script returns nonzero or an artdaq process doesn't launch, the reason for this gets suppressed. Two real-world examples of this include the source of a setupARTDAQDEMO script returning nonzero for the following reason:
and none of the processes launching on mu2edaq11:
where it's important to note that the latter error does not make it into the MessageFacility logfile - in fact, no MessageFacility logfile gets created at all for the unlaunched process.
A (temporary) record of what happened if something goes wrong with a process launch should be saved, and in the event that something goes wrong, users should be pointed to it. If processes launch without a problem, the record should be deleted.