flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
168 stars 50 forks source link

flux-exec: do not operate on failed subprocesses #6395

Closed chu11 closed 1 month ago

chu11 commented 1 month ago

Problem: If a subprocess fails for some reason in flux-exec, it still remains in the global subprocesses list. That means that operations on the whole list, such as killing all subprocesses or sending stdin to subprocesses, will continue to operate on those failed subprocesses.

Solution: When a subprocess fails, it should be removed from the global list.

chu11 commented 1 month ago

nevermind ... we check that the process is still running at all appropriate locations e.g.

        while (p) {                                                                                                                         
            if (flux_subprocess_state (p) == FLUX_SUBPROCESS_INIT                                                                           
                || flux_subprocess_state (p) == FLUX_SUBPROCESS_RUNNING) {                                                                  
                if (flux_subprocess_write (p, "stdin", ptr, lenp) < 0)                                                                      
                    log_err_exit ("flux_subprocess_write");                                                                                 
            }                                                                                                                               
            p = zlist_next (subprocesses);                                                                                                  
        }