FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.26k stars 217 forks source link

Automatic reconnection of the interrupted connection #8209

Open livius2 opened 3 months ago

livius2 commented 3 months ago

Currently, connections to the server are permanent TCP/IP connections. The slightest disruption in the network causes the connection to be lost, along with any unsaved data, transactions, etc. It would be highly beneficial if the server supported connections that could be interrupted and resumed once the internet connection is restored. While intermediary applications that attempt to provide similar functionality can be implemented, these are typically services like REST that do not support transactions per se and complicate the things a lot. Presently, if a connection is lost, the server usually keeps the "connection" for about 2 hours, but there is no chance to resume it since it is a standard TCP protocol.

It can be easly fixed: Will be good to have an extension to the protocol to obtain a connection token. If the connection is broken, it remains active on the server side for a specified time, e.g., 30 minutes, and can be resumed by providing the usual login and password authorization data along with this token.

hvlad commented 3 months ago

Presently, if a connection is lost, the server usually keeps the "connection" for about 2 hours It is not completely true. It happens only if network stack doesn't report network error to the application (Firebird), AFAIK.

By description - it is not "connectionless", it is "restorable" or "resumable" or "re-attachable".

How do you offer to extend client API to use this feature ? How server should know that lost client might re-attach again ? How long server should keep such broken connection waiting for re-attaching ? What kind of apps or environment demands such feature ?

livius2 commented 3 months ago

By description - it is not "connectionless", it is "restorable" or "resumable" or "re-attachable".

yes, you have right, but in database world it is near connectionless ;-)

How do you offer to extend client API to use this feature ? How server should know that lost client might re-attach again ?

The API should have a parameter at connection start (e.g., login time) to indicate that a token is required. Without this parameter, it will be a "normal" connection.

How long server should keep such broken connection waiting for re-attaching ?

This can be fixed at development time or set as a parameter, but it should be limited to a few hours only, I suppose.

What kind of apps or environment demands such feature ?

All? Currently, there is no way to do this, so users are complaining and have to reconnect.

livius2 commented 3 months ago

In fact, even a few hours is too much. The normal scenario is that the interruption lasts a maximum of a few seconds, and perhaps even providing a token as a parameter is unnecessary.

It would be sufficient if the server and client automatically exchanged something that would enable automatic reconnection, making changes to the API potentially unnecessary.

aafemt commented 3 months ago

If you think that you must restore not just connection but also state of transactions, opened cursors, prepared statements, etc and all this can be interrupted in any (worst possible) moment... It is next to impossible to implement.

A simple example: a connection is interrupted during fetch of non-scrollable cursor and server has sent 10 records while client has received 5 of them. The rest "just disappear" in intermediate network devices. What will you do after the reconnect? Ok, the simple answer: just forget this cursor and reopen it. But... How it will be different from a brand-new attachment?

livius2 commented 3 months ago

By "client" I mean the driver (e.g., fbclient), not the customer application. So fbclient should know what it fetches and what work should be restored. There might be points that are not restorable, but maybe not.

The most common connection drop occurs when nothing is happening, the connection is "idle." For example, a user fetches data, looks at it on the screen for 2 minutes, and then clicks some button. The connection drops for 1 second during those 2 minutes while the user is looking at the screen, and the next button click results in an error. Therefore, you can start with something simpler, like restoring the connection when it was "idle," and then progressively handle more complex cases.

aafemt commented 3 months ago

By "client" I mean the driver (e.g., fbclient), not the customer application.

So do I. No way. I spent years trying to design it. In vain.

The most common connection drop occurs when nothing is happening, the connection is "idle."

This is an infamous CISCO "feature" - timeout of inactive TCP connections. Easily solved by dummy packets.

livius2 commented 3 months ago

So do I. No way. I spent years trying to design it. In vain.

Are you sure we're talking about the same case? I'm referring to a situation where the customer application is still running, it has transaction handles, connections, cursors, etc. It has the fbclient library loaded. I'm not talking about a situation where everything needs to be restarted from scratch. The only thing that happens is a new command from the application to fbclient, an attempt to communicate with the Firebird server is made, and it turns out that the TCP/IP connection was interrupted and needs to be restored. On the Firebird side, the attachment still exists, and only access to it needs to be regained.

This is an infamous CISCO "feature" - timeout of inactive TCP connections. Easily solved by dummy packets.

I'm not talking about such a long period of inactivity (idle) where the router or even the firewall kills the connection. I'm saying that most often the TCP/IP connection is interrupted in the short time when no commands are being transmitted. The simplest case (though I'm not mainly talking about it) is, for example, you're a passenger in a car with a laptop with a SIM card, and the connection is briefly interrupted. Or you're using a phone as a router and traveling by train, the BTS changes, and the connection is momentarily interrupted.

mrotteveel commented 3 months ago

By "client" I mean the driver (e.g., fbclient), not the customer application. So fbclient should know what it fetches and what work should be restored. There might be points that are not restorable, but maybe not.

But the client doesn't know exactly what the server sent if the connection breaks mid-fetch, so it doesn't have a way to recover, and effectively the server doesn't know what the client received before the connection broke, so it doesn't have a way to recover either.

The most common connection drop occurs when nothing is happening, the connection is "idle." For example, a user fetches data, looks at it on the screen for 2 minutes, and then clicks some button. The connection drops for 1 second during those 2 minutes while the user is looking at the screen, and the next button click results in an error. Therefore, you can start with something simpler, like restoring the connection when it was "idle," and then progressively handle more complex cases.

Given how I understand TCP/IP, "connection drops" (whatever you mean with that) during idleness will rarely trigger a breaking of the TCP/IP connection, unless you're using network equipment or some intermediate layer that will then automatically proceed to send out RST packets, which in general they shouldn't do because the whole point of the design of TCP/IP is to survive connection interruptions.

The problem you're describing is a lot harder than you seem to think. Both serverside and clientside are stateful, and recovery so that client and server are in agreement about that state they should be in can be hard, and effectively means that they would both need to reset to a clean slate, i.e. just like a new connection.

hvlad commented 3 months ago

What kind of apps or environment demands such feature ?

All? Currently, there is no way to do this, so users are complaining and have to reconnect.

Hmm... "all" - is very far from reality, imho. I would expect answer like:

you're a passenger in a car with a laptop with a SIM card, and the connection is briefly interrupted. Or you're using a phone as a router and traveling by train, the BTS changes, and the connection is momentarily interrupted.

mrotteveel commented 3 months ago

The simplest case (though I'm not mainly talking about it) is, for example, you're a passenger in a car with a laptop with a SIM card, and the connection is briefly interrupted. Or you're using a phone as a router and traveling by train, the BTS changes, and the connection is momentarily interrupted.

That sounds like a situation where using a database connection from the client system directly to the database is asking for problems (for several reasons, including security!). Using REST web services co-located with the database (either on the same host, or at least on the same local network), and the client talking to the REST web service is far more resilient to these types of interruptions (and yes, these interruptions can indeed cause connection resets, because mobile data networks are prone to send RST intentionally when they're not sure if the handset is still connected).

livius2 commented 3 months ago

@hvlad Hmm... "all" - is very far from reality, imho. I would expect answer like:

And what if you are connected via a LAN cable and the connection is interrupted for a second? Should it be treated differently? Probably not. So All ;-)

livius2 commented 3 months ago

That sounds like a situation where using a database connection from the client system directly to the database is asking for problems (for several reasons, including security!)

VPN connection ;-)

livius2 commented 3 months ago

But the client doesn't know exactly what the server sent if the connection breaks mid-fetch, so it doesn't have a way to recover, and effectively the server doesn't know what the client received before the connection broke, so it doesn't have a way to recover either.

Does the server send anything without a prior request from the client? So, the client knows whether it can safely resume the connection or not. There might be situations where resuming the connection is not possible.

hvlad commented 3 months ago

@hvlad Hmm... "all" - is very far from reality, imho. I would expect answer like:

And what if you are connected via a LAN cable and the connection is interrupted for a second? Should it be treated differently? Probably not. So All ;-)

Why LAN wired connection could be interrupted (i.e. broken and restored) and how often you see it ? Time to fix your LAN and/or cable ? :)

livius2 commented 3 months ago

Why LAN wired connection could be interrupted (i.e. broken and restored) and how often you see it ? Time to fix your LAN and/or cable ? :)

An internet provider can sometimes cause a drop in the connection. LAN is more stable, but the situation may occur, but of course, less frequently.

mrotteveel commented 3 months ago

But the client doesn't know exactly what the server sent if the connection breaks mid-fetch, so it doesn't have a way to recover, and effectively the server doesn't know what the client received before the connection broke, so it doesn't have a way to recover either.

Does the server send anything without a prior request from the client? So, the client knows whether it can safely resume the connection or not. There might be situations where resuming the connection is not possible.

The example of Dimitry is one: if you request a fetch, you don't know in advance how many rows you're actually going to receive, and the server forgets the rows immediately after it has written them to the client. So the client might have received 0..M rows, while the server has sent N rows, so you're missing data.

Another example, the client has sent a commit, but the client never received the acknowledgement. This means the client and server state are out of sync. Same with preparing or executing statements, closing cursors, statements or dropping prepared statements.

Even worse, what if the client drops out and never comes back (e.g. because it wasn't a connection failure, but a client getting killed), you'll be holding resources and locks for longer waiting for it to be re-enabled, possibly impeding access for other clients.

aafemt commented 3 months ago

BTW Mark is right: TCP itself was designed to be unbreakable with all these ACK packets, timeouts and retries. If TCP connection got broken - something really bad happened.

livius2 commented 3 months ago

@mrotteveel Then, in such cases, it will be unrecoverable; in other cases, it can be brought back to life.

you'll be holding resources

We are talking about a short time, a few or a dozen seconds.

@aafemt Bad things happen, but this is a normal situation (short time). We're not talking about a meteorite damaging the router ;-)