Open choksi81 opened 10 years ago
Dani is working on this...
I can confirm that this problem exists, testing on Mac OS X 10.11.6 with Python 2.7.13 and the sample code above. A packet trace in Wireshark shows that the TCP stack sends multiple SYNs to the destination, gets no reply, increases the inter-segment spacing following a certain progression (1, 1, 1, 1, 2, 4, 8, 16, and 32 seconds), and then gives up after 75 seconds have elapsed. Then, the next call to a socket method raises a socket error with errno
22, "Invalid argument". (This is different from the Linux error, and Windows probably shows yet something else.)
I'll test on Travis and AppVeyor to see what the maximum allowed timeout is between platforms. We can then patch emulcomm
to raise a RepyArgumentError
for timeouts larger than this.
On Linux / Ubuntu 14.04.5 LTS, the maximum timeout is 127 seconds.
Windows Server 2012 R2 (as tested on AppVeyor) just raises a WSAETIMEDOUT
every 20 seconds. Inside the 300 seconds period, it raises no other error.
The actual errno and message are (10060, 'A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond')
(Side note, Ubuntu 14.04.5 LTS with an old Python 2.5.6 does it the Windows way...)
OK, seems we have two options here:
My problem with the first option is that hardcoded constants like these may change across OS versions, and we'd need to play catch-up. OTOH, the patch is a single additional max=75
in the namespace definition of the timeout
parameter.
My problem with the second option is that the Linux/Mac errors are so unspecific (software abort / invalid argument) that we could end up masking other errors. (However, I don't think we've seen "other errors" in the Repy network API lately.) Also, since Windows doesn't raise a different type of error for large timeouts, and the upper limits for Linux and Mac are different, we'd be exposing the node's OS indirectly.
What about a combination of both, i.e. catch the Python socket exception and raise something that suggests a socket timeout error only if thetimeout
argument is above a certain threshold and a generic RepySocketException otherwise, instead of preemptively failing with a RepyArgumentError?
I think this would be very similar to my second option, which is (un)reliable (due to the generic error number of the exception) and OS-dependent (since Windows always raises the same error). Am I missing something?
Or maybe I missed something. Let me try to rephrase:
(1) Always raise RepyArgumentError
if timeout > max_timeout
even before Python raised a socket.error
(problem: how chose the right max_timeout
)
(2) Always re-raise Python's socket.error
as RepySocketTimeoutError
(problem: could mask other errors)
(3) Re-raise Python socket.error
as RepySocketTimeoutError
if timeout > max_timeout
and RepySocketError
otherwise
(reduces the likelihood of masking other errors (2) and reduces the severity of choosing the wrong max_timeout
because it does not fail preemptively)
I had (3) in mind and read your ideas as (1) and (2) respectively.
After having clarified my concerns with @lukpueh offline, I think we agree that OS specificity is the key issue here. I will propose a patch that clamps the possible timeout values at 75 seconds, and prepare a unit test that checks the validity of this number so that we don't shoot our future selves in the foot.
When invoking the openconnection() API call, if the timeout field is set to be a large number, unexpected python socket error is raised. Error can be replicated with the following code.
Output is: