openconnection() raises unexpected python error with high timeout value

choksi81 commented 10 years ago

When invoking the openconnection() API call, if the timeout field is set to be a large number, unexpected python socket error is raised. Error can be replicated with the following code.

timeout = 300
start_time = getruntime()
log("Start time: " + str(start_time))

try:
    openconnection(gethostbyname("gmail.com"), 12345, getmyip(), 12345, timeout)
except TimeoutError:
    pass
except Exception, e:
    log("Error raised: " + str(e)) 

end_time = getruntime()
log("End time: " + str(end_time))
log("Elapsed time: " + str(end_time - start_time))

Output is:

Start time: 0.353349924088Internal Error

---
Uncaught exception!

---
Following is a full traceback, and a user traceback.
The user traceback excludes non-user modules. The most recent call is displayed last.

Full debugging traceback:
  "/home/monzum/exdisk/work/affix_library/namespace.py", line 1206, in wrapped_function
  "/home/monzum/exdisk/work/affix_library/emulcomm.py", line 1304, in openconnection
  "/home/monzum/exdisk/work/affix_library/emulcomm.py", line 1143, in _timed_conn_initialize
  "/usr/lib/python2.7/socket.py", line 224, in meth

User traceback:

Exception (with class 'socket.error'): [103](Errno) Software caused connection abort

JustinCappos commented 10 years ago

Dani is working on this...

aaaaalbert commented 7 years ago

I can confirm that this problem exists, testing on Mac OS X 10.11.6 with Python 2.7.13 and the sample code above. A packet trace in Wireshark shows that the TCP stack sends multiple SYNs to the destination, gets no reply, increases the inter-segment spacing following a certain progression (1, 1, 1, 1, 2, 4, 8, 16, and 32 seconds), and then gives up after 75 seconds have elapsed. Then, the next call to a socket method raises a socket error with errno 22, "Invalid argument". (This is different from the Linux error, and Windows probably shows yet something else.)

I'll test on Travis and AppVeyor to see what the maximum allowed timeout is between platforms. We can then patch emulcomm to raise a RepyArgumentError for timeouts larger than this.

aaaaalbert commented 7 years ago

On Linux / Ubuntu 14.04.5 LTS, the maximum timeout is 127 seconds.

aaaaalbert commented 7 years ago

Windows Server 2012 R2 (as tested on AppVeyor) just raises a WSAETIMEDOUT every 20 seconds. Inside the 300 seconds period, it raises no other error.

The actual errno and message are (10060, 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond')

aaaaalbert commented 7 years ago

(Side note, Ubuntu 14.04.5 LTS with an old Python 2.5.6 does it the Windows way...)

aaaaalbert commented 7 years ago

OK, seems we have two options here:

Cap the allowed timeout at below 75 seconds to accomodate OSX/BSD, or
Try to catch the Python socket exception and report something useful to the sandbox.

My problem with the first option is that hardcoded constants like these may change across OS versions, and we'd need to play catch-up. OTOH, the patch is a single additional max=75 in the namespace definition of the timeout parameter.

My problem with the second option is that the Linux/Mac errors are so unspecific (software abort / invalid argument) that we could end up masking other errors. (However, I don't think we've seen "other errors" in the Repy network API lately.) Also, since Windows doesn't raise a different type of error for large timeouts, and the upper limits for Linux and Mac are different, we'd be exposing the node's OS indirectly.

lukpueh commented 7 years ago

What about a combination of both, i.e. catch the Python socket exception and raise something that suggests a socket timeout error only if thetimeout argument is above a certain threshold and a generic RepySocketException otherwise, instead of preemptively failing with a RepyArgumentError?

aaaaalbert commented 7 years ago

I think this would be very similar to my second option, which is (un)reliable (due to the generic error number of the exception) and OS-dependent (since Windows always raises the same error). Am I missing something?

lukpueh commented 7 years ago

Or maybe I missed something. Let me try to rephrase:

(1) Always raise RepyArgumentError if timeout > max_timeout even before Python raised a socket.error (problem: how chose the right max_timeout)

(2) Always re-raise Python's socket.error as RepySocketTimeoutError (problem: could mask other errors)

(3) Re-raise Python socket.error as RepySocketTimeoutError if timeout > max_timeout and RepySocketError otherwise (reduces the likelihood of masking other errors (2) and reduces the severity of choosing the wrong max_timeout because it does not fail preemptively)

I had (3) in mind and read your ideas as (1) and (2) respectively.

aaaaalbert commented 7 years ago

After having clarified my concerns with @lukpueh offline, I think we agree that OS specificity is the key issue here. I will propose a patch that clamps the possible timeout values at 75 seconds, and prepare a unit test that checks the validity of this number so that we don't shoot our future selves in the foot.

SeattleTestbed / repy_v2

openconnection() raises unexpected python error with high timeout value #52