Vizonex / Winloop

An Alternative library for uvloop compatability with windows
https://pypi.org/project/winloop
Apache License 2.0
74 stars 7 forks source link

UV_EAGAIN (errno 4088) related to high/low watermarks. #13

Closed lschoe closed 5 months ago

lschoe commented 6 months ago

This issue is about the second problem mentioned in issue #12.

The problem seems to be related to the current flow control implementation in Winloop 0.1.0. An UV_EAGAIN (errno 4088) occurs when handling larger volumes of data exchanges. The problem is not present with any of the other event loops (asyncio's proactor, selector, nor with uvloop).

You can reproduce the problem running, e.g., the np_lpsolver demo for the MPyC package. The following code is from the latest version MPyC v0.9.7 on GitHub, see mpyc/asyncoro.py. This is inside the connection_made(transport) method:

        if 'winloop' in str(self.runtime._loop):
            # Workaround to avoid problems with handling of high/low watermarks when
            # running MPyC demos with large communication complexity.
            # For example, "python np_lpsolver.py -i5 -M3 -E1"
            # gives "OSError: [Errno 4088] Unknown error"  (4088 stands for UV_EAGAIN).
            # Winloop 0.1.0 default setting, with FLOW_CONTROL_HIGH_WATER = 64:
            #   high = 65536 = FLOW_CONTROL_HIGH_WATER * 1024
            #   low  = 16    = FLOW_CONTROL_HIGH_WATER // 4
            # Workaround:
            self.transport.set_write_buffer_limits(high=2**18)
            # With high=2**18, problem still present for cases with even larger communication
            # complexity, like "python np_lpsolver.py -i7 -M3 -E1", and setting
            # high=2**22 resolves this case as well.
            # Alternative workaround: set high=0 to disable use of high/low watermarks.
            # NB: these workarounds do not work with SSL,
            # for example, "python np_lpsolver.py -i5 -M3 -E1 --ssl"
            # keeps giving "OSError: [Errno 4088] Unknown error" whatever we set for high and low.
            print(f'workaround for Winloop: {self.transport.get_write_buffer_limits()=}')

Using python np_lpsolver.py -i5 -M3 -E1 in mpyc/demos runs the demo between three parties on localhost using Winloop. If the default flow control settings are used, all parties will get a 4088 error. If the high watermark is increased from 2**16 to 2**18, the problem disappears. The problem also disappears by setting high=0. But, interestingly, these workarounds do not work with SSL.

Vizonex commented 6 months ago

This seems to be an issue beyond my scope of knowledge but I have seen it do things and bugs at times. Even I'm still trying to port more stuff from linux over to windows. I will flag this issue as a bug for now but if I knew which cython file this was coming from that will help me figure out how to fix it...

lschoe commented 6 months ago

To reproduce this error easily I'm attaching a small program hi2.py. If you run start python hi2.py 1 & python hi2.py 0 on a Windows command prompt, a server and client will be run, and the above error 4088 will show. The program can also be used to see the error for issue #12.

You can also disable the use of Winloop to see that the error does not happen with the default event loop. Or try some other settings, for which the problem sometimes disappears, or shows a different kind of error.

lschoe commented 6 months ago

Looks like I've found a fix for the problem behind this issue #13, and also a fix for the other issue #12.

I had been looking at the reason why the errors are actually not printed correctly on Windows, and that way I was prepared to see the problems in these two lines of Winloop/handles/stream.pyx at some point:

https://github.com/Vizonex/Winloop/blob/8f38d32e415469cc8bdbb55b35e5a6cec04e27ff/winloop/handles/stream.pyx#L573

https://github.com/Vizonex/Winloop/blob/8f38d32e415469cc8bdbb55b35e5a6cec04e27ff/winloop/handles/stream.pyx#L789

On Windows, we have uv.EOF equal to -1 (instead of -4095) and uv.EAGAIN equal to 11 (instead of -4088). Putting these values -4095 and -4088, hard coded in the source code for now, fixes the problem! So, the workaround above is no longer needed.

Vizonex commented 6 months ago

Looks like I've found a fix for the problem behind this issue #13, and also a fix for the other issue #12.

I had been looking at the reason why the errors are actually not printed correctly on Windows, and that way I was prepared to see the problems in these two lines of Winloop/handles/stream.pyx at some point:

https://github.com/Vizonex/Winloop/blob/8f38d32e415469cc8bdbb55b35e5a6cec04e27ff/winloop/handles/stream.pyx#L573

https://github.com/Vizonex/Winloop/blob/8f38d32e415469cc8bdbb55b35e5a6cec04e27ff/winloop/handles/stream.pyx#L789

On Windows, we have uv.EOF equal to -1 (instead of -4095) and uv.EAGAIN equal to 11 (instead of -4088). Putting these values -4095 and -4088, hard coded in the source code for now, fixes the problem! So, the workaround above is no longer needed.

@lschoe I will try to implement this in shortly thanks for figuring this out for me...

Vizonex commented 6 months ago

@lschoe Hopefully this bug-fix is what you were looking for https://github.com/Vizonex/Winloop/commit/66bfb74e41a6418c5715994ec94c9b3968109ec2 I will close this issue if you think that this change is acceptable enough...

Vizonex commented 5 months ago

@lschoe I merged your pull request instead. I was confused with what was needed but I will close this issue now that it's correctly been handled...

lschoe commented 5 months ago

Great! This way I can now run all MPyC programs, with all datasets small to large, experiencing the same kind of speedup for Winloop as for uvloop.

The PR is a simple, direct fix. For a more systematic fix, also to cover other potential problems due to misaligned error numbers, I think a nice target for you is to first fix the error messages printed by Winloop,

Currently, we get something like "OSError: [Errno 4088] Unknown error" printed, but -- from looking at your errors.pyx --- you want something like "BlockingIOError" and then with the correct error message "resource temporarily unavailable" attached based on the line " XX(EAGAIN, "resource temporarily unavailable")" from the uv.h file. This does some macro stuff involving

typedef enum {
#define XX(code, _) UV_ ## code = UV__ ## code,
  UV_ERRNO_MAP(XX)
#undef XX
  UV_ERRNO_MAX = UV__EOF - 1
} uv_errno_t;

which takes things (prefixed with UV__) from uv/errno.h. Not sure how much you need to do still to get things working on Windows. I would hope/expect this kind of thing has been sorted out already (by/for libuv?), as it is probably needed in many other applications as well. Anyway, Errno.h has the list with Windows error codes.

Ideally, you can fix this such that you can use the existing uvloop code involving error codes without making changes at that level. Only some changes at a lower level, to deal with the mapping of the error numbers, is done in one central place in the source code for Winloop.