Open tschak909 opened 2 months ago
tnfsd
offers a built-in mechanism for removing expired sessions. An expired session is the one where the last contact was made in more than SESSION_TIMEOUT, by default 6h. The session is removed then, but the TCP connection is not disconnected, so client may think it's still connected.
Also, mechanism for removing old sessions runs when the new client is connected, so it may give impression that new clients are removing the old ones.
This probably can be reproduced by:
tnfsd
with FujiNet, wait for 60s (client 1).tnfsd
with another FujiNet or Python client (client 2).This should allow to confirm whether client 1 and/or client 2 behaves as expected or not.
Once this is reproduced, we can:
tnfsd
after such timeout. Correct behaviour is reconnecting with a new session id.I investigated the problem a little bit further and now I think the cause lies elsewhere.
The session timeout works fine, it shouldn't disconnect TCP sockets, because a single socket may handle multiple sessions. In fact, we should disable it for the TCP connections completely, because if a connection is alive, so should be the session. This mechanism makes sense for the UDP only, where we don't know if the session is alive or not if it's idle. I created https://github.com/FujiNetWIFI/tnfsd/pull/1/commits/98ea15a86984fd2a8bfd307b87beea14d15dfade to address that, but it's not the root cause of the problem.
There are 2 factors causing problems around timeouts and reconnects:
TNFS server
If the TCP connection is disconnected, TNFS server will remove all related sessions. As a result, if the same client connects again and tries to use the same session id, they'll receive invalid session error (and most likely they won't know how to handle it, see the section about the client).
In the https://github.com/FujiNetWIFI/tnfsd/pull/1 I changed the behaviour of the server - after that it will no longer remove the sessions on TCP disconnection, but only unset the socket FD.
Breaking TCP connections can be tested by staring tnfsd on a non-standard port 16385 and putting https://github.com/Shopify/toxiproxy between tnfsd and FujiNet client:
tnfsd -p 16385 &
toxiproxy-server &
toxiproxy-cli create --listen 0.0.0.0:16384 --upstream localhost:16385 tnfs
With this command, toxiproxy will automatically redirect traffic from its own 16384 port to tnfsd 16385. Now we can bring the connection down/up:
toxiproxy-cli toggle tnfs
or introduce latency:
toxiproxy-cli toxic add -t latency -n tnfs_latency -a latency=100 -a jitter=50 tnfs
After the PR above, TNFS client is able to get it session back, even after TCP connection is lost.
TNFS client
TNFS client currently is not able to handle invalid session error coming from the server. Apart from the disconnected TCP socket (resulting in the removed sessions, as above), this may happen if the server is restarted or the session expires after 6h (see my previous comment), regardless of the used protocol: TCP or UDP.
The code responsible for handling such cases exists, but now it's disabled for the FujiNet hardware (it's only active on PC):
It tries to create a new session if the old one is invalid. I tested it and it's able to handle the lost session gracefully. It's enough to e.g.: keep browsing the TNFS directory tree. Unfortunately, handles to all opened files are lost, as they are kept server-side in the session data.
I created #727 to enable this feature. While testing it, I found another minor problem in the tnfsd, which won't return any response if the provided FD is invalid. It's fixed in the https://github.com/FujiNetWIFI/tnfsd/pull/1/commits/c1a359737e2ff5d40fb3e82a699523f7740b22e0.
Also, https://github.com/FujiNetWIFI/fujinet-firmware/pull/727/commits/b772f99c6fc54fd06daed60d5733e926d560586a adds support for _udp.
prefix. It can be useful for the testing purposes.
I added one final commit https://github.com/FujiNetWIFI/fujinet-firmware/pull/727/commits/874f0be97ecffe574c436932eb66e1703abfa553 to the #727. It starts a periodic timer for every mounted TNFS filesystem and sends a keep-alive message every 60s.
This way:
tnfsd
even if they are idle for >6h, if only the FujiNet is running,Long story short: this should allow mounting an ATR image and leave the computer overnight. The next morning the mounted ATR should be still usable.
The only case where it wouldn't be usable is the tnfsd
server restart.
The TNFS client is not detecting that the TCP connection has been closed, and attempts to use it.
TNFSD also needs checking.
-Thom @trekawek