espressif / esp-lwip

Fork of lwIP (https://savannah.nongnu.org/projects/lwip/) with ESP-IDF specific patches
Other
79 stars 126 forks source link

lwip_select: errno not set properly on error (IDFGH-4544) #27

Closed boborjan2 closed 3 years ago

boborjan2 commented 3 years ago

I run into a case where select() returns -1 and errno is set to 109 (ENOPROTOOPT) that is unexpected from select(). After some debugging I found the following: 1) in lwip_select(), return value of -1 is set on line 2208, returned by lwip_selscan(). 2) errno is not set anywhere, the value of 109 is the result of my previous setsockopt() call and that is considered OK 3) I found that originially lwip_selscan() only returned values >=0 and only since this bugfix commit it is possible that it returns -1: https://github.com/espressif/esp-lwip/commit/3107d4a0fa95f93b55998bcccff1b3c66d959329 !Please note that from this point, lwip_selscan() returns -1 when tryget_socket_unconn_locked() return NULL. This can happen when a socket free is pending (sock->fd_free_pending is set, see sock_inc_used_locked()). This is what is happening in my case: a socket is being closed when select runs. This is a normal situation, EBADF should be set to errno in this case according to select() manual pages. 4) The author of that commit also added error checking of lwip_selscan() (return of EBADF) 5) unfortunately, later additions to lwip_select() miss this error checking of lwip_selscan, so EBADF is not returned in those cases.

I suggest this check to be added to every call of lwip_selscan(). Every comments are welcome, Thanks, Viktor

Alvin1Zhang commented 3 years ago

Thanks for reporting.

yuanjianmin commented 3 years ago

Sorry to reply to your question so late. I check the code you mentioned in lwip_select(), return value of -1 is set on line 2208, returned by lwip_selscan(). Do you mean the following code:

if (waitres == SYS_ARCH_TIMEOUT) {
        /* Timeout */
        LWIP_DEBUGF(SOCKETS_DEBUG, ("lwip_select: timeout expired\n"));
        /* This is OK as the local fdsets are empty and nready is zero,
           or we would have returned earlier. */
      } else {
        /* See what's set now after waiting */
+2208        nready = lwip_selscan(maxfdp1, readset, writeset, exceptset, &lreadset, &lwriteset, &lexceptset);
        LWIP_DEBUGF(SOCKETS_DEBUG, ("lwip_select: nready=%d\n", nready));
      }
    }
  }

  lwip_select_dec_sockets_used(maxfdp1, &used_sockets);
  set_errno(0);

I check here, it is possible to return -1, but the errno will be set to 0. Maybe it occurs in this place. Can you help reproduce it and confirm if it is in this place.

if (nready < 0) {
        /* This happens when a socket got closed while waiting */
        lwip_select_dec_sockets_used(maxfdp1, &used_sockets);
+++ // return here without setting errno 
        return -1;
      }

      if (waitres == SYS_ARCH_TIMEOUT) {
        /* Timeout */
        LWIP_DEBUGF(SOCKETS_DEBUG, ("lwip_select: timeout expired\n"));
        /* This is OK as the local fdsets are empty and nready is zero,
           or we would have returned earlier. */
      } else {
        /* See what's set now after waiting */
        nready = lwip_selscan(maxfdp1, readset, writeset, exceptset, &lreadset, &lwriteset, &lexceptset);
        LWIP_DEBUGF(SOCKETS_DEBUG, ("lwip_select: nready=%d\n", nready));
      }

Finally, could you please test the following change whether can fix your problem.

index e807825..c0d8106 100644
--- a/src/api/sockets.c
+++ b/src/api/sockets.c
@@ -1946,6 +1946,9 @@ lwip_selscan(int maxfdp1, fd_set *readset_in, fd_set *writeset_in, fd_set *excep
     } else {
       SYS_ARCH_UNPROTECT(lev);
       /* no a valid open socket */
+#if ESP_LWIP
+      set_errno(EBADF);
+#endif
       return -1;
     }
   }
boborjan2 commented 3 years ago

Hi, thanks for investigating the issue.

  1. set_errno(x) only sets errno if x is not zero. So errno is left unset there. This is what happens in my case. But even if it did set to zero, it would be not OK: if lwip_selscan() returns -1, errno should be set to error (EBADF).
  2. I will check your patch soon and get back to you
yuanjianmin commented 3 years ago

Hi, thanks for investigating the issue.

1. set_errno(x) only sets errno if x is not zero. So errno is left unset there. This is what happens in my case. But even if it did set to zero, it would be not OK: if lwip_selscan() returns -1, errno should be set to error (EBADF).

2. I will check your patch soon and get back to you

Thanks for correcting, I ignore the errno value will keep the value of last time when set_errno(0). Thanks for trying my patch and looking forward to your reply.

boborjan2 commented 3 years ago

Hi, I checked the patch. It does solve my issue. However there might still be problems: while errno will be set correctly, there are places where control flow is not changed. E.g. line 2133, return value is not checked. Errno is set but program runs as nothing happened. I wonder if it is correct not return immediately with error in these cases. Is there any recovery to be done? Like calling lwip_select_dec_sockets_used() ? Maybe the best would be to report this one to upstream as well. ?

yuanjianmin commented 3 years ago

Hi, I checked the patch. It does solve my issue. However there might still be problems: while errno will be set correctly, there are places where control flow is not changed. E.g. line 2133, return value is not checked. Errno is set but program runs as nothing happened. I wonder if it is correct not return immediately with error in these cases. Is there any recovery to be done? Like calling lwip_select_dec_sockets_used() ? Maybe the best would be to report this one to upstream as well. ?

hi, i agree that report this one to upstream as well. It is official code problem and we need report it.

yuanjianmin commented 3 years ago

Hi, I checked the patch. It does solve my issue. However there might still be problems: while errno will be set correctly, there are places where control flow is not changed. E.g. line 2133, return value is not checked. Errno is set but program runs as nothing happened. I wonder if it is correct not return immediately with error in these cases. Is there any recovery to be done? Like calling lwip_select_dec_sockets_used() ? Maybe the best would be to report this one to upstream as well. ?

hi @boborjan2 , i have reported the issue to upstream and wait for reply.

yuanjianmin commented 3 years ago

Hi, I checked the patch. It does solve my issue. However there might still be problems: while errno will be set correctly, there are places where control flow is not changed. E.g. line 2133, return value is not checked. Errno is set but program runs as nothing happened. I wonder if it is correct not return immediately with error in these cases. Is there any recovery to be done? Like calling lwip_select_dec_sockets_used() ? Maybe the best would be to report this one to upstream as well. ? hi @boborjan2, can you please help me try this patch and give me some feeback.

diff --git a/src/api/sockets.c b/src/api/sockets.c
index 53c58ea..6b605b1 100644
--- a/src/api/sockets.c
+++ b/src/api/sockets.c
@@ -2111,7 +2111,11 @@ lwip_select(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset,
/* Call lwip_selscan again: there could have been events between
the last scan (without us on the list) and putting us on the list! */
nready = lwip_selscan(maxfdp1, readset, writeset, exceptset, &lreadset, &lwriteset, &lexceptset);
-        if (!nready) {
+        if (nready < 0) {
+          set_errno(EBADF);
+          lwip_select_dec_sockets_used(maxfdp1, &used_sockets);
+          return -1;
+        } else if (!nready) {
/* Still none ready, just wait to be woken */
if (timeout == 0) {
/* Wait forever */
@@ -2185,6 +2189,11 @@ lwip_select(int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset,
/* See what's set now after waiting */
nready = lwip_selscan(maxfdp1, readset, writeset, exceptset, &lreadset, &lwriteset, &lexceptset);
LWIP_DEBUGF(SOCKETS_DEBUG, ("lwip_select: nready=%d\n", nready));
+        if (nready < 0) {
+          set_errno(EBADF);
+          lwip_select_dec_sockets_used(maxfdp1, &used_sockets);
+          return -1;
+        }
}
}
}
AxelLin commented 3 years ago

FYI: the upstream lwip fix: http://git.savannah.gnu.org/cgit/lwip.git/commit/?id=b00a05d086d3594602a9700f0960798f7e9f73a6

boborjan2 commented 3 years ago

Hi, the patch posted here and the upstream patch differ slightly (the upstream does not return immediately with error around line 2110). I presume the upstream one is the result of some discussion (?) I am not involved. I can confirm that both solve my issue because in my case, errno setting was skipped after an lwip_selscan() call around line 2185. Anyhow thanks for your work and kind cooperation Viktor

AxelLin commented 3 years ago

Hi, the patch posted here and the upstream patch differ slightly (the upstream does not return immediately with error around line 2110). I presume the upstream one is the result of some discussion (?) I am not involved.

https://savannah.nongnu.org/bugs/index.php?59946#comment2