SFTtech / openage

Free (as in freedom) open source clone of the Age of Empires II engine 🚀
http://openage.dev
Other
12.79k stars 1.13k forks source link

Network communication design #530

Open TheJJ opened 8 years ago

TheJJ commented 8 years ago

tl;wr: the server does all calculations, clients only render the state they received. state is transmitted by keyframes at "predicted" end points.

The design is pretty much like on would do it for a FPS. The basic idea is this: The server sends a packet which updates the "target" state of the future for clients. They interpolate the received movement/update/... functions and try reaching the target state all the time.

server
------

* central trust instance for the game state.
* does all the calculations based on input events.
* create and distribute keyframes for all clients

clients
-------

* receive keyframes
* calculate world state for current time by interpolating keyframes
* display world state and fancy animations et
* send actions to server (timestamp is probably a bad idea)

This is the low-level part for transmitting the results of the simulation itself.

To understand how the prediction works, see #740.

janisozaur commented 8 years ago

In https://github.com/OpenRCT2/OpenRCT2 we have added multiplayer capability to the game, and included a simple lobby, master server, which lists online public games.

At first there was no authentication, but people abused publicly available servers to destroy maps. We have then added ability to password-protect servers and assign users into groups with specified permissions, but it turned out to not be enough, as we could not make those permissions persist. Our plan was to include centralised authorisation server, but this met some substantial resistance, as it was "against the spirit of open source", even when we wanted to publish the code for running your own auth service.

The solution we came up was to utilise OpenSSL's public-key cryptography to generate key on client and have the client sign server-generated token, client then sends signature to the server, gets verified and if that step is successful, the client is granted permissions he was assigned last time.

If you would like to employ similar scheme in your project, you can use https://github.com/OpenRCT2/OpenRCT2/pull/3699 as a reference.

TheJJ commented 8 years ago

Thanks for the hints, doesn't sound bad. But this issue is rather about the simple server-client interaction, without all the lobby stuff. I just opened another issue for that: #562. I don't think we'll have a griefing problem, because games are rather short-lived and only spectators should be able to join afterwards. For client authentification, we thought about using GPG, so achievements can be issued to key owners for example. Match-rejoins after a disconnect can also be performed that way securely. But it's not that much a difference to OpenSSL, except maybe licensing fuckup.

janisozaur commented 8 years ago

When I was designing the system, I found out about https://keybase.io/, this looks nice and something we could've probably used, but it still in invite-only mode. If you use crypto keys nevertheless, perhaps something worth taking a look at.

timo-42 commented 8 years ago

I tried to understand the proposed netcode: flow chart dia file for editing

sources: aoe2 netcode we dont want it (cheaters, performance problems) our netcode inspiration

janisozaur commented 8 years ago

One interesting bit in this diagram is limiting network messages clients receive to only what they can see. This is a very sound approach, but one that could probably make syncing game state much much harder. I'm eager to see how you guys solve that.

Another piece of documentation you may perhaps find useful or a source of inspiration would be https://github.com/OpenTTD/OpenTTD/blob/master/docs/desync.txt

Have you already given any thought how would your network stream look like? Are you going to use something to encapsulate it, like protobuf?

timo-42 commented 8 years ago

protobuf looks exactly what we need for sending messages. It seems simple enough, so all developers can use without knowledge about the network system. You need somthing synct to the client? No problem create message format and send it.

desync: we should not have problems with desync. The Server is authoritative. We could do something like: hash over all 20 units => send it to the server, if something is wrong server sends the fulll unit information for these 20 units again therefore there is no "butterfly effect", we can always get the true state(building, techtree, units, world) from the server

sending only relevant information to the client:

  1. server calculates all changes and creates messages
  2. which messages happen not in the fog of war off a player? send these
VelorumS commented 8 years ago

Desync is still an issue, because the info we're getting from the server is about evolution of a unit in next several seconds. Bad implementation will jingle or create a lot of traffic.

Even if our implementation has no desync bugs, it still needs to handle things like de-spawning a unit as a part of the normal flow.

Also, don't forget in the diagram another prediction loop when units start to move before we even send command to the server.

timo-42 commented 8 years ago

You are right, there will be small desyncs until the server sends updated path, hitpoints, etc for the unit. But the alternative lockstep model has huge disadvantages:

I don't understand where is the problem with de-spawning units? send a message with unit A is killed/will be killed at 2:45

is the prediction loop necessary/possible? We cant know when the message exactly arrives at the server and will be processed. We wont send a timestamp with it, because this may be a big loophole for cheating. Therefore, A command will be executed when the server says so, not before. If we do prediction client side, we may be undo building creation, ..., I think that would be disappointing? experience for the player if he wants to build a castle and we give the visual feedback and 80ms later we remove it from the map because another player builds a tower there.

VelorumS commented 8 years ago

I don't understand where is the problem with de-spawning units? send a message with unit A is killed/will be killed at 2:45

Message from server: "Unit is in the prod queue, will be ready in 10 seconds". Ok, client spawns a unit in 10 seconds. Message from server: "Sorry, actually rax were destroyed and there never was any unit spawned". Client makes the unit vanish.

About prediction: you click - they move. No time to wait for the server roundtrip, because after playing with 20ms ping, the players will be getting nausea on 100ms.

janisozaur commented 8 years ago

The best thing you could do about floats is not to use them for game state. It may be hard, but other than following IEEE754 to a t, or using something like libfixmath*, you face differences in implementations. Check out also https://www.reddit.com/r/gamedev/comments/3tx6gh/article_i_wrote_minimizing_the_pain_of_lockstep/

* I know you guys, you will try to roll your own.

VelorumS commented 8 years ago
  • I know you guys, you will try to roll your own.

There is one already. I was like "wtf, these uint64_t values are making no sense when I'm printing them!".

timo-42 commented 8 years ago

new term proposal: prediction: used when the server sends a path to client, client can interpolate the server predicted path pre prediction: client gets user action and tries to predict the right action

@ChipmunkV good point. But we could send a message: "building destroyed","units newer existed: 4,5,6" (so no body will be shown instead when the units killed message would be used)

for referencing the ids: every client get a pool assigned with ids(global unique),so he chooses them and the server can reference them when they are killed before the queue message arrives at the server

New Algorithm:

  1. SDL_Events
  2. create messages and queue them locally and send them to server
  3. process server messages (prediction)
  4. process local queue (pre prediction)
  5. make pre predictions, who attacks who?
  6. draw screen

maybe client side pre prediction is not needed or it is only useful in some cases. Client side pre prediction would be extremly hard to implement.

now the server must look at the timestamps from clients and choose if they are plausible. Clients with different Pings have different time to send their messages. What happens if a client ping drops suddenly? Do we drop their messages? Does he want to cheat? Messages from server to client in 20ms and from client to server 100ms. If Server accepts old messages from 100ms ago, client may exploit this time diffrence. What happens if a client ping improves suddenly? Did he cheat prior?

If we allow pre prediction, we must replay all frames which happend since then. => this may be highly cpu intense

My opinion: Server accept messages as they come in as if they were triggered now. What clients may can do to reduce lag: if we know ping is 40ms, than we can assume out message(which should be correct) will be processed in 20ms. So we buffer it until then and we wont be so far off. Note: 20ms is the same frame @30fps and the next one @60fps.

janisozaur commented 8 years ago

https://xkcd.com/654/

TheJJ commented 8 years ago

@timohaas jup that sound like a good way.

We should abstract the communication in several ways, the net/ subsystem is just for network communication, then we'll have curve/ for all the curve generation and interpolation, and adapt the game logic to make use of them.

Then we need to create/extend the renderer/ subsystem to display the curves with the appropriate assets.

This is all gonna be very hard and needs lots of restructuring, but we can do it!

mic-e commented 8 years ago

Possible protobuf alternative: https://capnproto.org/

janisozaur commented 8 years ago

Yes, protobuf is not the only kid on the block. capnproto has nice summarising feature matrix and some decent comparisons on this page: https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html

Tomatower commented 8 years ago

I would implement a custom protocol based on a deterministic lockstep method. Ontop of UDP as non-reliable but stateless and robust against package loss.

The idea is to transfer "single occurence, event like" information in a reliable way (more later) and "status updates" like unit positions etc. in a non-reliable way.

To transfer a piece of information reliable, you transmit it with every packet every frame (maybe more than one eventframe per frame), until you recieve an ACK from the other end, that events until the specific frame number have been transmitted.

Status updates are used to fill up a packet to MTU size (because if a packet is split on the way because it is bigger than the MTU, you again loose precious milliseconds)

So more in detail a example packet can look like that:

frame_ack: 6{ 
   {
      frame: 1 { Event: Move Unit 5 to X/Y: 10/100; Send Message "FOO" to user 2 }, 
      frame: 5 { Event: Unit 5: attack unit 7; Unit 4: die }
   }, 
   {
      unit 5:{ x: 3 y: 90; HP: 1325; Trajectory: 4/90, 4/91, 5/92}
      unit 7:{ x: 10; y:101; HP 123523; Trajectory: 0}
   }
}

This means for the remaining concept:

Please give me your comments what you think about this concept.

VelorumS commented 8 years ago

@Tomatower, you may start writing net/.

I'm making a sketch of what should be between net/ and render/: https://github.com/ChipmunkV/openage/blob/dd6aa96d326e5cd9556357ae14c864709fb688b5/libopenage/curve/entities_conductor.h

(branch https://github.com/ChipmunkV/openage/commit/dd6aa96d326e5cd9556357ae14c864709fb688b5)

Tomatower commented 8 years ago

Currently I was more thinking about applying the different packages pulled from the server (via a "get events for frame 5"-method), and then interpolating from there with the exact same codebase as it exists on the server.

If one needs to extrapolate etc. it would be better to apply a "what-if" interface:

"I am in Frame 8. Now the Server said: In Frame 5 unit 3 started to move. Where is it now?"

timo-42 commented 8 years ago

netcode Roadmap proposal:

at this point we have a running connection, dumb client

at this point we can play on the server and the clients can render it

at this point we have a running multiplayer game

@Tomatower we should not use UDP. It would make the network stack more complex. We dont need the speed down to the last ms.

Tomatower commented 8 years ago

The first Network-Client should be to start a second low spec tool, that can blindly render, what the full spec Server (the current version: You can move units etc.). This will test the basic communication.

Then we should provide the possibility for the client to create input events to be transmitted to the server, and be mapped to another player than the one currently running.

At this point in time we start implementing the need to know principle

Then we can start designing a lobby, that enables to connect different players together, and start removing the input in the server (maybe keep the rendering without FOW?)

Then we can think about creating fancy stuff like hot-seat switching, AI via Network, ...

So as Checklist:

VelorumS commented 8 years ago

The unknown is an integration with the game data. The subsystem should use the right types of curves for the unit/player properties, defined in nyan. And it must be transparent, so if there is a change in the property definition - it's only in the .nyan file and the renderer.

gamestate without curves(map tiles, messages, player ressources)

Have you considered implementing them as curves?

VelorumS commented 8 years ago

Currently I was more thinking about applying the different packages pulled from the server (via a "get events for frame 5"-method), and then interpolating from there with the exact same codebase as it exists on the server. If one needs to extrapolate etc. it would be better to apply a "what-if" interface: "I am in Frame 8. Now the Server said: In Frame 5 unit 3 started to move. Where is it now?"

In that networking model, server sends events, but some of them are already extrapolations. So, server is at frame 5, but already telling that the unit will finish being produced on the frame 105 (200ms logic frame, so - in 20s).

Then server corrects the predictions as it goes.

The nice thing is that the client can throw in its own extrapolations into the same array, and render that as usual. Later it will be overwritten by the corrections from the server anyways.

TheJJ commented 8 years ago

For transmitting nyan objects over the network, we can either submit all the applied patches and track all changes on all peers that way, or we can "bake" objects and just send those (e.g. the combined variant of all attributes of that unit type). Sending patches in a guaranteed way will probably reduce coding overhead for the prediction subsystem though.

The nyan system is synced by just sending the time of a patch application. The database has to be identical at the start, and patches are applied by event-curves ("apply patch Loom at time 2332s"). The client nyan database is then used for its local prediction, the server uses it for authroritative simulation anyway.

Tomatower commented 7 years ago

After having a more in-depth look into the features, i have found SCTP as a possible network wrapper. It has good support in Linux, and there are libraries available for windows.

It supports especially multiple streams and off-band messages.

Tomatower commented 7 years ago

I have an Idea about a possible Curve-API and Code in the Backend, and I welcome feedback

The Idea

The Data to be accessed seems to look like that without curve (a rought example, that shall be seen only as example for the API idea)

struct single_unit { //Obejct Store
    vector<2, float> position; //MultidimensionalContinuou
    float hp; //SimpleContinuous
    int ammo; //Discrete
};

struct gameContainer { //Object Store
    std::unordered_map<int, single_unit> units; //Identificators
};

void event(event); //Event triggering

Curve API

The derived API shall be like that:

class SingleUnit : public curve::Object {
    curve::Array<2, float> position;
    curve::Continuous<float> hp;
    curve::Discrete<int> ammo;
};

class GameContainer : public CurveObject {
    curve::unordered_map<int, SingleUnit> units;
};

class curve::Event {
    int event_type;
    vector<int32_t> data;
};

curve::event_iterator events(time start, time end);

extra parsers for the event iterator stuff can be implemented.

The API Interface Idea

The core definitions will look like:

event_iterator {
    //Standard iterator stuff
    bool valid(); //if there are still elements in the queue
};

Reading: curve::s can be initialized with a certain timestamp and a referencing mother object. Then one can access their values.

Data is stored within the curve::s themselve, each do basically their own history managament

data functions on non-list types.

  1. add(time, value) insert a new value inbetween or at the end (depends on the timestamp)
  2. replace (time, value) insert a new value, and remove everything after
  3. remove(time) remove a keyframe. time has to match exactly

data getters on single-dimensional values

  1. get(time) get the value at this point in time
  2. get() - if object was bound to a time at construction

data getters on multi-dimensional values (Array and unordered map)

  1. get(time, i)
  2. get(i) - if object was bound to a time at construction

data functions on the unordered map:

  1. add(time, std::pair<int, value>) add a new key/value-pair. will fail if key already exists. A key has to be globally and over the whole time of the game unique.
  2. remove(time, key)
  3. iterate(time) - gives an iterator over the list at the given time
TheJJ commented 7 years ago

We escalated this even more, so the transmission over the network is still curves via keyframes, internally we create the predictions with #740.

simonsan commented 4 years ago

Protobuf alternative https://google.github.io/flatbuffers/

Why not use Protocol Buffers, or .. ?

Protocol Buffers is indeed relatively similar to FlatBuffers, with the primary difference being that FlatBuffers does not need a parsing/ unpacking step to a secondary representation before you can access data, often coupled with per-object memory allocation. The code is an order of magnitude bigger, too. Protocol Buffers has neither optional text import/export nor schema language features like unions.

Nice! This is a simple web front end for the FlatBuffers Compiler (flatc 1.10.0): https://flatbuffers.ar.je/

heinezen commented 4 years ago

@simonsan AoE also uses flat buffers in their multiplayer protocol I think.

duanqn commented 4 years ago

Is it possible to take NAT into consideration when we choose protocols? In China the IPv4 address is so limited compared to the large user base so we have NAT everywhere. A lot of games can't be played without a dedicated server because they don't work with NATs.

Also looking into the future it might be useful to make our game compatible with IPv6.

heinezen commented 4 years ago

@duanqn I would support that and we would have to think about that anyway since other mechanisms such as DSLite make direct IP connections impossible. Committing to IPv6 would also be a good idea.

TheJJ commented 4 years ago

For now we assume that the dedicated server is reachable somehow (directly (v4/v6), VPN, portforward, ...), so we don't plan for any nat-traversal mechanisms. These can be added later as an extension when we deem them useful.

duanqn commented 4 years ago

@TheJJ I agree that NAT is not a top concern for now, but it's probably beneficial to keep it in mind. My concern is that if we choose some protocols, then NAT traversal may become impossible. For example I think any TCP-based protocols cannot work with NAT. I think direct IP multiplayer game is essential and is a more practical goal, especially in our development phase.

heinezen commented 4 years ago

@duanqn Do you know which type of NAT we would be dealing with in Chinese networks? I have little knowledge on how the network over there work :)

For most applications, STUN/ICE/TURN and NAT64 should work for the UDP side, and also for TCP if I remember correctly. Some of these methods would also require a relay server to be reachable from China which could be tricky.

duanqn commented 4 years ago

@heinezen Good question... I don't know actually. I think the network environment varies from region to region, but in general there is heavy presence of NAT and firewalls. I don't know a lot about NAT traversal. I just googled those methods. I think TURN would definitely work. Getting a relay server with public IP is not hard (just get a machine from any cloud service provider), but it is also expensive. The cost is similar to having a dedicated server. It would be better if we can use STUN. If my understanding is correct the STUN server is not involved in actual data transfer. However since I never heard about it I assume it's not working. I found some implementations on GitHub maybe I can ask my friends to test them.

VelorumS commented 4 years ago

@duanqn as a client use the Trickle ICE WebRTC sample for testing. It's just a local web page that can communicate with STUN and TURN servers. Can test with some public servers.

As a STUN/TURN server the resiprocate-turn-server works fine. It's in the Ubuntu repo.

My current job is a VoIP server, routers, recorders.

duanqn commented 4 years ago

I asked one of my friend to test with https://github.com/jselbie/stunserver It looks like the NAT mapping and firewalls are independant of remote IP:port pairs. So there is a good chance we can use STUN. But I haven't found a test application that actually does NAT hole-punching.