erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.32k stars 2.94k forks source link

`socket` reverts to uninitialised state on Windows #8853

Open jonatanklosko opened 4 days ago

jonatanklosko commented 4 days ago

Describe the bug

After unsuccessful socket:connect attempts, socket operations start to return {error,notinitialised} (corresponds to WSANOTINITIALISED in WinSock), as if WinSock was no longer initialised.

To Reproduce

I narrowed it down to a case where we attempt two connections. Interestingly, after the first failure everything still works as expected, but after the second attempt socket breaks altogether.

-module(test).

-export([test/0]).

test() ->
    file:delete("sock1"),
    file:delete("sock2"),
    {ok, Socket1} = socket:open(local, stream),
    ok = socket:bind(Socket1, #{family => local, path => <<"sock1">>}),
    {error, _} = socket:connect(Socket1, #{family => local, path => <<"none">>}),
    {ok, Socket2} = socket:open(local, stream),
    ok = socket:bind(Socket2, #{family => local, path => <<"sock2">>}),
    {error, _} = socket:connect(Socket2, #{family => local, path => <<"none">>}),
    erlang:display(socket:open(local, stream)).
    %% => {error,notinitialised}

(closing each socket after connect does not help)

Expected behavior

Subsequent socket:open to return a new socket.

Affected versions

Tested on OTP 27.

Additional context

In case this matters, I am running virtualized ARM Windows 11.

jonatanklosko commented 4 days ago

Looking at the code WSAStartup is called once, when the NIF is loaded, whereas WSACleanup is called in case of errors. The WSAStartup documentation states:

When it has finished using the services of the Winsock DLL, the application must call WSACleanup to allow the Winsock DLL to free internal Winsock resources used by the application.

So perhaps WSACleanup ends up being called too eagerly and effectively uninitializes WinSock.

Also, the WSANOTINITIALISED error description says:

[...] or WSACleanup has been called too many times.

Note that I am not familiar with WinSock, so these are my best guesses.

bmk commented 4 days ago

I think you are correct. I used it to "clean up" the last operation, but that is clearly not how its supposed to be used. Thanks for finding this!

Regards, /Micael

jonatanklosko commented 3 days ago

Hey @bmk, do you think that once this is fixed, the fix could be backported in a patch release for 26 and 27?

bmk commented 3 days ago

That was my intention.

bmk commented 2 days ago

PR solving this (I hope): bmk/esock/20240924/excessive_use_of_cleanup/otp-19251 Has not run in the daily tests yet.