cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.03k stars 623 forks source link

Network layer soft errors #1269

Open pervazea opened 6 years ago

pervazea commented 6 years ago

These don't seem to cause any visible failures, but imply that something is not 100% correct.

Occur when running the smallbank benchmark. There are more of these logged. Only 2 samples are extracted below.

work/connection_handle.cpp:538:CloseSocket] DEBUG - Attempt to close the connection 63
....
/connection_handle.cpp:569:CloseSocket] DEBUG - Already Closed the connection 58
tcm-marcel commented 6 years ago

I see this line all the time, I think this is usual behavior. But I have no idea what exactly is happening.

Nov11 commented 6 years ago

the only piece of code print debug message in connection_handle.cpp

  while (true) {
    int status = close(sock_fd_);
    if (status < 0) {
      // failed close
      if (errno == EINTR) {
        // interrupted, try closing again
        continue;
      }
    }
    LOG_DEBUG("Already Closed the connection %d", sock_fd_);
    return Transition::NONE;
  }

I think retrying on close failure should be removed.

  1. The fd is closed on linux regardless of close function call returns an error or not. No need to worry about leaking fd. https://stackoverflow.com/questions/33114152/what-to-do-if-a-posix-close-call-fails http://man7.org/linux/man-pages/man2/close.2.html
  2. As the man page ( Dealing with error returns from close() section) suggests, it is harmful to close again on failure. This may result in closing a reused fd. For example, It may close a file descriptor opened by the smallbank test and cause an hard-to-reproduce test failure. I just imagine the possible damage. Not sure if it's the real cause.

I'd like to fix this by remove while loop around close.


Mac OS X differs from linux. in that it should be 'close$NOCANCEL' other than 'close'. https://chromium.googlesource.com/chromium/src/base/+/master/mac/close_nocancel.cc

pervazea commented 6 years ago

I agree on correctness of removing the while.

However, if the close fails, it should be logged, something like:

Close failed on connection %d, errno %d

(the current log message is wrong... you don't know it was "already closed", it could be EIO)

If a string description of errno is readily available, use it instead of the number, otherwise the numeric form is ok.

Maybe, if we find that the close failures are legitimate, we can suppress some of the logging later. If there are real failures occurring, this will give us more data to diagnose and fix them.


From: Nov11 notifications@github.com Sent: Saturday, April 7, 2018 9:52 AM To: cmu-db/peloton Cc: Pervaze Akhtar; Author Subject: Re: [cmu-db/peloton] Network layer soft errors (#1269)

the only piece of code print debug message in connection_handle.cpp

while (true) { int status = close(sockfd); if (status < 0) { // failed close if (errno == EINTR) { // interrupted, try closing again continue; } } LOG_DEBUG("Already Closed the connection %d", sockfd); return Transition::NONE; }

I think retrying on close failure should be removed.

  1. The fd is closed on linux regardless of close function call returns an error or not. No need to worry about leaking fd. https://stackoverflow.com/questions/33114152/what-to-do-if-a-posix-close-call-fails http://man7.org/linux/man-pages/man2/close.2.html
  2. As the man page ( Dealing with error returns from close() section) suggests, it is harmful to close again on failure. This may result in closing a reused fd. For example, It may close a file descriptor opened by the smallbank test and cause an hard-to-reproduce test failure.

I'd like to fix this by remove while loop around close.

- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/cmu-db/peloton/issues/1269#issuecomment-379470917, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AhDfwN8f5HztUB7K4fvjtcgdsyRnA4gHks5tmMS6gaJpZM4TKGFL.