Open pms1969 opened 7 years ago
Hello Paul,
First of all thank you for the feedback.
I would like to understand more about you use case and I wonder if you can tell me more about your deployment.
To start:
I have a service in production running on top of Orleans with Swarm and it has several silos and clients and all work fine.
That error is indeed a networking issue. Most likely there is a problem on your setup.
Hi @galvesribeiro,
I'll answer as best I can:
If you were to replicate this locally, 2 machines running docker on both independently with identical networking setups, both containers would get the same docker ip address.
My silo config xml is as such:
<?xml version="1.0" encoding="utf-8"?>
<OrleansConfiguration xmlns="urn:orleans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="OrleansConfiguration.xsd">
<Globals>
<SystemStore SystemStoreType="Custom"
DeploymentId="0.1"
DataConnectionString="Service=eu-west-1;"
MembershipTableAssembly="OrleansAWSUtils"
ReminderTableAssembly="OrleansAWSUtils" />
<ReminderService ReminderServiceType="None" />
</Globals>
<Defaults>
<Networking Address="" Port="10000" />
<ProxyingGateway Address="" Port="30000" />
<Tracing DefaultTraceLevel="Info" TraceToConsole="true" />
</Defaults>
</OrleansConfiguration>
The actual Network and ProxyGateway address are filled in at runtime with the hosts ip address.
The client config xml:
<?xml version="1.0" encoding="utf-8"?>
<ClientConfiguration xmlns="urn:orleans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="OrleansConfiguration.xsd">
<SystemStore SystemStoreType="Custom"
DeploymentId="0.1"
DataConnectionString="Service=eu-west-1;"
MembershipTableAssembly="OrleansAWSUtils"
GatewayProvider="Custom"
CustomGatewayProviderAssemblyName="OrleansAWSUtils"
ReminderTableAssembly="OrleansAWSUtils" />
</ClientConfiguration>
Cheers, Paul
Ok, I need to investigate more, but, my first guess, is that Beanstalk does not build a network for its containers which explains why one silo can't talk to each other.
I'll read more about it but if that is the case, your proposed fix will make no good.
Orleans requires network connectivity between the silos (LAN, Site-To-Site VPN, etc). If your containers can't talk to each other, there is no way to form a cluster and that justify the messages you mentioned.
Hi @galvesribeiro, Why won't it work? I have a silo with this code deployed, and I've scaled it to 2 instances and it works just fine.
The containers can't talk directly to each other, but by publishing the network address of their host, they can. It also means that the clients don't have to override the gateway after discovery.
@pms1969 Beanstalk has no internal network. You need to publish the public IP address of each node and make the silos communicate using public IP/port which is not a good idea.
Orleans was designed to work in a trusted network since it has (yet) no means of secure the connections between silos and clients.
I would recommend you to move away from it if you plan to use Orleans. A good alternative is to use regular AWS EC2 Container service which gives you a container cluster where you can create networks between containers so Orleans would work fine with that document.
Just to give you an idea, Beanstalk works very similar to Azure App Services. It was made to pack simple (web) applications that are mostly stateless and only process incoming (http) requests. We can't run Orlans on Azure App Services for the same reason we can't on Beanstalk. There is no inter-node/private network. What people usually do, is to have the public/frontend Orleans client hosted on Azure App Service and connect it to another Azure service hosting Orleans Cluster (Cloud Service with Worker Roles, Container Service, VMs, etc.) by creating a VPN between the two vNets (the app service one and the one attached to the service running Orleans Cluster).
I hope it help.
Hi @galvesribeiro,Can you give me an example of using Docker swarm?
The Docker set up in the documentation is great if everything is running on one host, but if you are running a cluster across multiple host as in AWS ElasticBeanstalk, then the configuration can't be used, since the hosts need to communicate over the published ip/port.
To work around this, I thought I'd first add a "bind address" to the config and pass that around to the SocketManager, but quite frankly, I got lost in the implementation, and eventually reverted to a trybind on the configed address, followed by a bind to 0.0.0.0/port. This gets the node up and running, but then every 30 seconds I get these 2 messages in my logs:
The cluster is running in a private subnet. The IP of the host is 10.11.3.211, and it's bound to port 10000.
It is worth noting that any communication with the host times out on the client:
On further examination, after turning the logging up to insane levels, the reason for those 2 log messages is that the messages are being stuck in a forwarding loop...
I tracked this down to the MessageCenter.Initialize call, where it creates the SiloAddress with the bound port, and not the Advertised one. So changed that appropriately, and everything seems to be behaving itself after I sorted out other environmental issues.
Can you see anything particularly wrong with this approach? Am I saving up undefined behaviour for some other time?
The full diff with sha 2cddf2fba09182e5baf4c17531092a19b5a4f82d, since I couldn't get head of master to work for me at all:
Happy to submit this as a pull request if you think it's worthy?
Cheers Paul