apache / nuttx

Apache NuttX is a mature, real-time embedded operating system (RTOS)
https://nuttx.apache.org/
Apache License 2.0
2.6k stars 1.11k forks source link

Wait after vfork get errno ECHILD #10159

Closed yangyalei closed 1 year ago

yangyalei commented 1 year ago

Execute the following code, wait fail with ret == -1 and errno == ECHILD.

    int ret;
    int pid = vfork();
    if (pid == 0)
    {
        execlp("sh", "sh", "-c", "ls", NULL);
    }
    else if (pid > 0)
    {
        int ret = wait(NULL);
        if (ret == -1)
         printf("wait fail from Parent! ret:%d, errno:%d\n", ret, errno);
    }

Is it an undefined behavior wait after vfork?But This code test ok on my ubuntu 20.04. I have lookup vfork and wait defines in some documents, no clear instructions found about this.

So, Do we need to fix this?

xiaoxiang781216 commented 1 year ago

@patacongo @pkarashchenko what your opinion?

patacongo commented 1 year ago

The logic seems bad (at least when CONFIG_SCHED_HAVE_PARENT is not defined). wait() just calls waitpid() with an invalid PID:

 67 pid_t wait(FAR int *stat_loc)
 68 {
 69   /* wait() is a cancellation point, but nothings needs to be done for this
 70    * trivial case.
 71    */
 72
 73   return waitpid(INVALID_PROCESS_ID, stat_loc, 0);
 74 }

But waitpid() return -ECHILD immediately if the pid is not valid (at least when CONFIG_SCHED_HAVE_PARENT is not defined):

114   ctcb = nxsched_get_tcb(pid);
115   if (ctcb == NULL)
116     {
117       ret = -ECHILD;
118       goto errout;
119     }

This logic appears in two places and there are several other cases where -ECHILD is returned. But if the PID is invalid and CONFIG_SCHED_HAVE_PARENT is not defined, -ECHILD is returned immediately by waitpid(). Has this logic changed recently?

Per the manpage:

The wait() function obtains status information for process termination from any child process. The waitpid() function obtains status information for process termination, and optionally process stop and/or continue, from a specified subset of the child processes.

And

The waitpid() function shall be equivalent to wait() if the pid argument is (pid_t)-1 and the options argument is 0. Otherwise, its behavior shall be modified by the values of the pid and options arguments.

The logic for pid == -1, only occurs for the case where CONFIG_SCHED_HAVE_PARENT is defined (in two places):

337       if (pid == INVALID_PROCESS_ID)
338         {
339           /* We are waiting for any child, check if there are still
340            * children.
341            */
342

If CONFIG_SCHED_HAVE_PARENT is not defined, the logic that returns -ECHILD immediately and it never checks if pid == -1. The logic might be OK if CONFIG_SCHED_HAVE_PARENT is defined, the ordering is different in that case.

patacongo commented 1 year ago

int ret = wait(NULL);

A good test would be to simply replace the above line in the test code with:

int ret = waitpid(pid, stat_loc, 0);

Another good test would be to try the original code again with CONFIG_SCHED_HAVE_PARENT defined in the .config file.

patacongo commented 1 year ago

I think I am beginning to understand the real issue here. My ramblings above are OK, but not directly to the point. The point:

If CONFIG_SCHED_HAVE_PARENT is not defined, then wait() cannot be supported. Is that correct? If that is true, then we only need to correct the dependencies to disable wait() if CONFIG_SCHED_HAVE_PARENT is not defined and the OP needs to define CONFIG_SCHED_HAVE_PARENT in the .config file.

This is another case where there is too many complex, interacting configuration settings. This harms code quality in my opinion and I think that another solution would be to remove such setts as CONFIG_SCHED_HAVE_PARENT and CONFIG_SCHED_CHILD_STATUS altogether and make them both implicitly true. That might cause some minor increase in code size, however.

yangyalei commented 1 year ago

Sorry for not explaining my question clearly.

I have enabled CONFIG_SCHED_HAVE_PARENT in my test. The real question is: First, in nuttx implement, vfork use waitpid to hang on the parent process, make it continue after child process ended. https://github.com/apache/nuttx/blob/d98bfc3e49b55ad560907241cbd7e184cf7aac0a/libs/libc/unistd/lib_vfork.c#L66

Second, this waitpid will make sure child process ended and remove the child process info in nxsched_waitpid. https://github.com/apache/nuttx/blob/d98bfc3e49b55ad560907241cbd7e184cf7aac0a/sched/sched/sched_waitpid.c#L361

In the end, if application call wait will find NO child process info in https://github.com/apache/nuttx/blob/d98bfc3e49b55ad560907241cbd7e184cf7aac0a/sched/sched/sched_waitpid.c#L264

In one world, it looks like vfork interface implement not support to wait vforked child process done, because we don't keep the vforked process info after it end(But the use of wait in parent process after vfork is so common).

yangyalei commented 1 year ago

I have submit a PR to this question, So please judge whether we need it. https://github.com/apache/nuttx/pull/10169