dotnet / interactive

.NET Interactive combines the power of .NET with many other languages to create notebooks, REPLs, and embedded coding experiences. Share code, explore data, write, and learn across your apps in ways you couldn't before.
MIT License
2.85k stars 381 forks source link

Race Condition on port #1958

Open rfrerebe opened 2 years ago

rfrerebe commented 2 years ago

Describe the bug

When opening several notebooks in jupyterhub, and restarting them to execute all cells, very often this exception is raised

Port number (here 1032) is usually in the low range of the allowed range.

Unhandled exception: System.IO.IOException: Failed to bind to address http://100.64.5.12:1032: address already in use.
 ---> Microsoft.AspNetCore.Connections.AddressInUseException: Address already in use
 ---> System.Net.Sockets.SocketException (98): Address already in use
   at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName)
   at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress)
   at System.Net.Sockets.Socket.Bind(EndPoint localEP)
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketTransportOptions.CreateDefaultBoundListenSocket(EndPoint endpoint)
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketConnectionListener.Bind()
   --- End of inner exception stack trace ---
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketConnectionListener.Bind()
   at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.SocketTransportFactory.BindAsync(EndPoint endpoint, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.TransportManager.BindAsync(EndPoint endPoint, ConnectionDelegate connectionDelegate, EndpointConfig endpointConfig, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.KestrelServerImpl.<>c__DisplayClass30_0`1.<<StartAsync>g__OnBind|0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.AddressBinder.BindEndpointAsync(ListenOptions endpoint, AddressBindContext context, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.AddressBinder.BindEndpointAsync(ListenOptions endpoint, AddressBindContext context, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.ListenOptions.BindAsync(AddressBindContext context, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.AddressBinder.AddressesStrategy.BindAsync(AddressBindContext context, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.AddressBinder.BindAsync(IEnumerable`1 listenOptions, AddressBindContext context, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.KestrelServerImpl.BindAsync(CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Server.Kestrel.Core.KestrelServerImpl.StartAsync[TContext](IHttpApplication`1 application, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Hosting.WebHost.StartAsync(CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Hosting.WebHost.Start()
   at Microsoft.DotNet.Interactive.App.CommandLine.CommandLineParser.<>c__DisplayClass5_0.<Create>b__0(StartupOptions startupOptions, InvocationContext invocationContext) in D:\a\_work\1\s\src\dotnet-interactive\CommandLine\CommandLineParser.cs:line 95
   at Microsoft.DotNet.Interactive.App.CommandLine.JupyterCommand.Do(StartupOptions startupOptions, IConsole console, StartServer startServer, InvocationContext context) in D:\a\_work\1\s\src\dotnet-interactive\CommandLine\JupyterCommand.cs:line 18
   at Microsoft.DotNet.Interactive.App.CommandLine.CommandLineParser.<>c__DisplayClass5_0.<<Create>g__JupyterHandler|11>d.MoveNext() in D:\a\_work\1\s\src\dotnet-interactive\CommandLine\CommandLineParser.cs:line 315
--- End of stack trace from previous location ---
   at System.CommandLine.NamingConventionBinder.CommandHandler.GetExitCodeAsync(Object returnValue, InvocationContext context)
   at System.CommandLine.NamingConventionBinder.ModelBindingCommandHandler.InvokeAsync(InvocationContext context)
   at System.CommandLine.Invocation.InvocationPipeline.<>c__DisplayClass4_0.<<BuildInvocationChain>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass20_0.<<UseParseErrorReporting>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Microsoft.DotNet.Interactive.App.CommandLine.CommandLineParser.<>c__DisplayClass5_0.<<Create>b__3>d.MoveNext() in D:\a\_work\1\s\src\dotnet-interactive\CommandLine\CommandLineParser.cs:line 224
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass13_0.<<UseHelp>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass24_0.<<UseVersionOption>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass22_0.<<UseTypoCorrections>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<UseSuggestDirective>b__21_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass19_0.<<UseParseDirective>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c.<<RegisterWithDotnetSuggest>b__6_0>d.MoveNext()
--- End of stack trace from previous location ---
   at System.CommandLine.Builder.CommandLineBuilderExtensions.<>c__DisplayClass9_0.<<UseExceptionHandler>b__0>d.MoveNext()

Please complete the following:

Using version : 1.0.317502

Screenshots

kernel-restarting

rfrerebe commented 2 years ago

My understanding is that there is a delay between when a port is chosen : https://github.com/dotnet/interactive/blob/main/src/dotnet-interactive/Program.cs#L102-L136 and when a port is used : https://github.com/dotnet/interactive/blob/main/src/Microsoft.DotNet.Interactive.Jupyter/Shell.cs#L78

This delay allows a race condition to happen (ie : 2 kernels with the same port can be launched, but only 1 will survive).

I am unsure of what the best solution could be:

Happy to discuss these solutions or any other and provide a PR if needed

daviddenis-stx commented 1 year ago

Quick workaround at https://github.com/dotnet/interactive/pull/2288