JuliaParallel / ClusterManagers.jl

Other
242 stars 74 forks source link

LSFManager: optionally stream worker stdout to master #168

Closed tanmaykm closed 3 years ago

tanmaykm commented 3 years ago

This updates the LSFManager to allow stdout and stderr streams of workers to be streamed (optionally) to master. The bpeek invocation when passed the -f flag behaves similar to tail -f. This is the default set for bpeek_flags. The resultant stream is then passed on to the Distributed module after parsing the worker connection string and some initial error handling for the case where job startup is delayed or failed. The Distributed module then takes over the subsequent display of worker's stdout messages on master.

If streaming of worker's stdout messages is not desired, then the -f flag can be omitted from bpeek_flags. Then bpeek behaves similar to cat and is used only to read the initial connection string.

As a positive side effect of this change, it now also avoids the earlier redirect_stderr() call which could have caused race conditions with multiple async invocations of the bpeek method. Few methods are also renamed to have a lsf_ prefix to avoid unnecessary clashes with other parts of the package.

bjarthur commented 3 years ago

looks good. but what is wrong with bpeek(manager... and _launch(manager...? shouldn't be any clashes as dispatch will choose the correct one based on the type of manager, no? i'd suggest sticking with the function names without the lsf_ prefix.

tanmaykm commented 3 years ago

Thanks for revewing! Yes dispatch would take care, but would likely not serve any purpose? Because internal methods are unlikely to have any common purpose across managers. Also prefixing would avoid having another poorly qualified method of a different manager with same name trigger method ambiguities I felt.

Does this seem reasonable?

tanmaykm commented 3 years ago

Unless there are any further major concerns, can this be merged?

bjarthur commented 3 years ago

i don't want to bikeshed, so i'll merge. will be great to have all i/o displayed by master. thanks!

but i will say that adding ::LSFManager to the type signature of bpeek and _launch is all i believe that is needed to mitigate your concerns about clashes with other parts of the package. an unfortunate omission of mine when i wrote this code. sure, the lsf_ prefix makes it less likely they'll be a name clash, but is not a guarantee. doesn't seem very Julian to me either (Base is littered with underscore prefixes and method overloading).

thanks again!

tanmaykm commented 3 years ago

Thanks! I was not so hung up on the naming too... agree with your points too. Maybe we need https://github.com/JuliaParallel/ClusterManagers.jl/issues/58 to avoid such worries. And many thanks for adding LSFManager to this package!