dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.04k stars 2.02k forks source link

Issues running Silo in Docker Compose: Unexpected direct silo connection on proxy endpoint #8894

Open SeppPenner opened 6 months ago

SeppPenner commented 6 months ago

I followed https://github.com/dotnet/orleans/issues/6895 and https://github.com/dotnet/orleans/issues/6420 because I have the same issue:

05.03.24 15:42:27.105    [WRN]   Error processing connection "[Local: 10.0.1.2:30000, Remote: 10.0.1.2:40764, ConnectionId: 0HN1T49T1CD3P]"
silo-1  | System.InvalidOperationException: Unexpected direct silo connection on proxy endpoint from S192.168.201.36:11111:447345745
silo-1  |    at Orleans.Runtime.Messaging.GatewayInboundConnection.ProcessConnection() in /_/src/Orleans.Runtime/Networking/GatewayInboundConnection.cs:line 109
silo-1  |    at Orleans.Runtime.Messaging.Connection.Run() in /_/src/Orleans.Core/Networking/Connection.cs:line 100

Additional data Net: 8.0 in Docker (Latest base image) Orleans: 3.6.5 (We can't update that easily at the moment) Host address is 192.168.201.36 (See log above)

How are the endpoints set:

var siloIpStr = Configuration.SiloIpAddress;
var siloIpAddress = IPAddress.Parse(siloIpStr);
var advertisedSiloIpAddress = IPAddress.Parse(Configuration.AdvertisedSiloIpAddress);

if (Environment.GetEnvironmentVariable("DOTNET_RUNNING_IN_COMPOSE")?.ToLowerInvariant() == "true")
{
    Log.Information("SiloHost is running in compose");
    siloIpAddress = IPAddress.Any;
}

builder.Configure<EndpointOptions>(
    options =>
    {
        options.SiloPort = Configuration.SiloPort;
        options.GatewayPort = Configuration.GatewayPort;
        options.AdvertisedIPAddress = advertisedSiloIpAddress;
        options.GatewayListeningEndpoint = new(siloIpAddress, Configuration.GatewayPort);
        options.SiloListeningEndpoint = new(siloIpAddress, Configuration.SiloPort);

        Log.Information("AdvertisedIPAddress: {AdvertisedIPAddress}", options.AdvertisedIPAddress);
        Log.Information("SiloPort: {SiloPort}", options.SiloPort);
        Log.Information("GatewayPort: {GatewayPort}", options.GatewayPort);
        Log.Information("SiloListeningEndpoint: {SiloListeningEndpoint}", options.SiloListeningEndpoint);
        Log.Information("GatewayListeningEndpoint: {GatewayListeningEndpoint}", options.GatewayListeningEndpoint);
    });

Configuration

"Configuration": {
    "SiloIpAddress": "0.0.0.0",
    "AdvertisedSiloIpAddress": "192.168.201.36",
    "SiloPort": 11111,
    "GatewayPort": 30000
}

Docker compose file


services:
  silo:
    image: somerepo.de/silo:2.101.1
    ports:
      - "8000:80"
      - "11111:11111"
      - "30000:30000"
      - "1883:1883"
    extra_hosts:
      - "localhost:192.168.201.36"
    working_dir: /app
    volumes:
      - /home/Silo/appsettings.json:/app/appsettings.json
    environment:
      ASPNETCORE_URLS: "http://*:5000"
      ASPNETCORE_ENVIRONMENT: "Production"
      DOTNET_RUNNING_IN_CONTAINER: true
      DOTNET_RUNNING_IN_COMPOSE: true
      TZ: Europe/Berlin
      LANG: de_DE.UTF-8
      LANGUAGE: de_DE.UTF-8
      LC_ALL: de_DE.UTF-8
    restart: always
  server:
    image: somerepo.de/server:2.101.1
    depends_on:
      - silo
    ports:
      - "8180:80"
    working_dir: /app
    volumes:
      - /home/Server/appsettings.json:/app/appsettings.json
    environment:
      ASPNETCORE_URLS: "http://*:5000"
      ASPNETCORE_ENVIRONMENT: "Production"
      DOTNET_RUNNING_IN_CONTAINER: true
      DOTNET_RUNNING_IN_COMPOSE: true
      TZ: Europe/Berlin
      LANG: de_DE.UTF-8
      LANGUAGE: de_DE.UTF-8
      LC_ALL: de_DE.UTF-8
    restart: always

Dockerfile Silo

FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base
WORKDIR /app
COPY publish .

EXPOSE 80 \
    11111 \
    30000 \
    1883

ENV ASPNETCORE_URLS="http://*:5000" \
    ASPNETCORE_ENVIRONMENT="Production" \
    DOTNET_RUNNING_IN_CONTAINER=true \
    TZ=Europe/Berlin \
    LANG=de_DE.UTF-8 \
    LANGUAGE=${LANG} \
    LC_ALL=${LANG}

CMD ["dotnet", "Silo.dll"]

Dockerfile Server

FROM mcr.microsoft.com/dotnet/aspnet:8.0 AS base
WORKDIR /app
COPY publish .

ENV ASPNETCORE_URLS="http://*:5000" \
    ASPNETCORE_ENVIRONMENT="Production" \
    DOTNET_RUNNING_IN_CONTAINER=true \
    TZ=Europe/Berlin \
    LANG=de_DE.UTF-8 \
    LANGUAGE=${LANG} \
    LC_ALL=${LANG}

CMD ["dotnet", "Server.dll"]

The services run perfectly when run bare metal on Linux, but not within Docker compose. I already checked that DOTNET_RUNNING_IN_COMPOSE is set properly in the silo container. The configuration files are in the correct folders and look good. I have cleared the orleansmembershiptable as well just in case.

SeppPenner commented 6 months ago

Some more logging output (Shows that the options are set properly:

silo-1  | 05.03.24 15:42:26.609    [INF]   Configuration Orleans.Configuration.EndpointOptions:
silo-1  | AdvertisedIPAddress: 192.168.201.36
silo-1  | SiloPort: 11111
silo-1  | GatewayPort: 30000
silo-1  | SiloListeningEndpoint: 0.0.0.0:11111
silo-1  | GatewayListeningEndpoint: 0.0.0.0:30000
SeppPenner commented 6 months ago

The server has the following configuration options (Should be ok as well, I guess)

{
    "SiloIpAddress": "192.168.201.36",
    "GatewayPort": 30000
}
SeppPenner commented 6 months ago

Something more to mention: We use a background service that implements BackgroundService from Microsoft.Extensions.Hosting. In there, we create a Orleans client. The client is needed to perform some checks on Silo startup. The client runs "locally" in the Silo itself and seems to be causing the issue...

private async Task<IClusterClient> CreateLocalOrleansClient()
{
    var siloIpAddress = this.advertisedSiloIp;

    if (Program.InCompose)
    {
        siloIpAddress = IPAddress.Any;
    }

    this.logger.Information("Initializing OrleansRpc client for silo ip {IpAddress}...", siloIpAddress);

    var orleansClient = new ClientBuilder().Configure<ClusterOptions>(
            options =>
            {
                options.ClusterId = this.configuration.ClusterId;
                options.ServiceId = this.configuration.ServiceId;
            })
        .UseStaticClustering(new IPEndPoint(siloIpAddress, this.configuration.GatewayPort))
        .ConfigureApplicationParts(
            parts =>
            {
                parts.AddApplicationPart(typeof(ISomethingGrain).Assembly).WithReferences();
            })
        .ConfigureLogging(l => l.AddSerilog())
        .AddSimpleMessageStreamProvider(StreamGlobals.SmsProvider)
        .Build();
    await orleansClient.Connect();
    return orleansClient;
}
SeppPenner commented 6 months ago

Something more to mention: We use a background service that implements BackgroundService from Microsoft.Extensions.Hosting. In there, we create a Orleans client. The client is needed to perform some checks on Silo startup. The client runs "locally" in the Silo itself and seems to be causing the issue...

private async Task<IClusterClient> CreateLocalOrleansClient()
{
    var siloIpAddress = this.advertisedSiloIp;

    if (Program.InCompose)
    {
        siloIpAddress = IPAddress.Any;
    }

    this.logger.Information("Initializing OrleansRpc client for silo ip {IpAddress}...", siloIpAddress);

    var orleansClient = new ClientBuilder().Configure<ClusterOptions>(
            options =>
            {
                options.ClusterId = this.configuration.ClusterId;
                options.ServiceId = this.configuration.ServiceId;
            })
        .UseStaticClustering(new IPEndPoint(siloIpAddress, this.configuration.GatewayPort))
        .ConfigureApplicationParts(
            parts =>
            {
                parts.AddApplicationPart(typeof(ISomethingGrain).Assembly).WithReferences();
            })
        .ConfigureLogging(l => l.AddSerilog())
        .AddSimpleMessageStreamProvider(StreamGlobals.SmsProvider)
        .Build();
    await orleansClient.Connect();
    return orleansClient;
}

I have reworked this now following https://learn.microsoft.com/en-us/dotnet/orleans/host/client?pivots=orleans-3-x.

The IClusterClient is now injected per DI, the issue persists, however...

SeppPenner commented 6 months ago

Again an update: The issue persists, but the server project now causes the issue, the Silo itself can be started properly... The client in the server project is initialized like this:

var gateways = this.configuration.OrleansSiloEndPoints; // This is just 192.168.201.36, port 30000 from the configuration.
// It's loaded properly, I check that. --> It's irrelevant as well as the gatewys are overwritten below with the environment variable.

if (Environment.GetEnvironmentVariable("DOTNET_RUNNING_IN_COMPOSE")?.ToLowerInvariant() == "true")
{
    var addresses = Dns.GetHostAddresses("server");
    gateways = addresses.Select(a => new IPEndPoint(a, 30000)).ToArray();
}

var client = new ClientBuilder()
    .Configure<ClusterOptions>(
        options =>
        {
            options.ClusterId = this.configuration.ClusterId;
            options.ServiceId = this.configuration.ServiceId;
        })
    .Configure<ClientMessagingOptions>(opts =>
    {
        opts.ResponseTimeout = TimeSpan.FromSeconds(90);
    })
    .Configure<MessagingOptions>(opts =>
    {
        opts.ResponseTimeout = TimeSpan.FromSeconds(90);
    })
    .UseStaticClustering(gateways)
    .ConfigureApplicationParts(parts =>
        {
            parts.AddApplicationPart(typeof(ISomethingGrain).Assembly).WithReferences();
        })
    .ConfigureLogging(logging => logging.AddSerilog())
    .Configure<ClientMessagingOptions>(options => options.MaxMessageBodySize = 134217728) //128 MB
        .AddSimpleMessageStreamProvider("SmsProvider")
        .Build();