TheHolyWaffle / TeamSpeak-3-Java-API

A Java wrapper of TeamSpeak's 3 server query API.
MIT License
306 stars 107 forks source link

Quarry timing out after 8-10 hours #285

Closed Starmixcraft closed 5 years ago

Starmixcraft commented 6 years ago

I have observd that the quarry sortof times out after a long time, no event is fired and no command can be sent, but the how am i command still runs with the message OK.

Log entry:

[15:52:40] [DEBUG] [whoami] > whoami
[15:52:40] [DEBUG] [whoami] < virtualserver_status=online virtualserver_id=1 virtualserver_unique_identifier=DfSfdKSgo3ftj+NDIHoSxb9KVuY= virtualserver_port=9987 client_id=7 client_channel_id=1 client_nickname=CloudSystem client_database_id=1 client_login_name=serveradmin client_unique_identifier=serveradmin client_origin_server_id=0
[15:52:40] [DEBUG] [whoami] < error id=0 msg=ok

There is no error throwen wenn this happens

rogermb commented 6 years ago

Huh, that's strange.

If you take a look at the KeepAliveThread source code, you'll see that it also just uses the API to send commands. In other words, it looks like you should be able to send commands, too.

Are you sure that you're not crashing your own program somehow without noticing?

But then again, that wouldn't explain why events aren't fired. Could you maybe upload your event listener code here so I could take a look at it?

Starmixcraft commented 6 years ago

Yes my programm runs 2 netty servers and thay function for weeks

        if(config.isEnableTSQuarry()) { //Teamspeak api
    TS3Config config = new TS3Config();
    config.setHost("127.0.0.1");
    config.setFloodRate(FloodRate.UNLIMITED);
    config.setEnableCommunicationsLogging(true);
    TS3Query query = new TS3Query(config);
    query.connect();
    ts3Api = query.getAsyncApi();
    ts3Api.login(cloudConfig.getTs3user(), cloudConfig.getTs3pass());
    ts3Api.selectVirtualServerByPort(9987);
    ts3Api.registerAllEvents();
    ts3Api.addTS3Listeners(new TS3ListenerToEventManager());
    ts3Api.setNickname("CloudSystem");
    }

Is in the main to start the api and the TS3ListenerToEventManager contains:

@Override
public void onChannelCreate(ChannelCreateEvent arg0) {
    TeamSpeakChannelCreateEvent event = new TeamSpeakChannelCreateEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);
}

@Override
public void onChannelDeleted(ChannelDeletedEvent arg0) {
    TeamSpeakChannelDeletedEvent event = new TeamSpeakChannelDeletedEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);
}

@Override
public void onChannelDescriptionChanged(ChannelDescriptionEditedEvent arg0) {
    TeamSpeakChannelDescriptionEditedEvent event = new TeamSpeakChannelDescriptionEditedEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);
}

@Override
public void onChannelEdit(ChannelEditedEvent arg0) {
    TeamSpeakChannelEditedEvent event = new TeamSpeakChannelEditedEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);
}

@Override
public void onChannelMoved(ChannelMovedEvent arg0) {
    TeamSpeakChannelMovedEvent event = new TeamSpeakChannelMovedEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);
}

@Override
public void onChannelPasswordChanged(ChannelPasswordChangedEvent arg0) {
    TeamSpeakChannelPasswordChangedEvent event = new TeamSpeakChannelPasswordChangedEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);
}

@Override
public void onClientJoin(ClientJoinEvent arg0) {
    TeamSpeakClientJoinEvent event = new TeamSpeakClientJoinEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);

}

@Override
public void onClientLeave(ClientLeaveEvent arg0) {
    TeamSpeakClientLeaveEvent event = new TeamSpeakClientLeaveEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);
}

@Override
public void onClientMoved(ClientMovedEvent arg0) {
    TeamSpeakClientMovedEvent event = new TeamSpeakClientMovedEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);
}

@Override
public void onPrivilegeKeyUsed(PrivilegeKeyUsedEvent arg0) {
    TeamSpeakPrivilegeKeyUsedEvent event = new TeamSpeakPrivilegeKeyUsedEvent(arg0);    
    CloudServer.getPluginManager().callEventAsync(event);
}

@Override
public void onServerEdit(ServerEditedEvent arg0) {
    TeamSpeakServerEditedEvent event = new TeamSpeakServerEditedEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);

}

@Override
public void onTextMessage(TextMessageEvent arg0) {
    TeamSpeakTextMessageEvent event = new TeamSpeakTextMessageEvent(arg0);
    CloudServer.getPluginManager().callEventAsync(event);
}

Simple code to translate the teamspeak events into my own event system, the event system is tested for weeks and works flawless

rogermb commented 6 years ago

Alright, but that gives me pretty much nothing to work with. It sounds like it could be a deadlock or some other multi-threading issue somewhere in our code, but if the keep-alive still works, there's not really any place where such a bug could be hiding.

Essentially, if our keep-alive can use the API object to successfully send commands over and over again, there's no reason why you shouldn't be able to do the same.

Moreover, the only reason that the keep-alive thread is sending these commands is because no other commands have been sent for over 60 seconds. If we also consider that the command-sending thread can still properly handle the whoami command, the only conclusion we can make is that the command send queue isn't being filled because you're not making any calls to TS3Api or TS3ApiAsync.

I can't say why these API calls aren't being made, though. Maybe it's some kind of issue in CommandFuture that keeps you waiting for a TS3 command forever. But then again, that wouldn't explain why events aren't being fired anymore. Or could it be that you're only ever sending TS3 commands from inside event handlers?

If you're having this issue again, could you also fire up Java Mission Control (jmc.exe or just jmc), attach yourself to the no-longer-working program and create a thread dump by running the "Thread.print" diagnostic command? That way, if there's a deadlock, we'll be able to find it.

Starmixcraft commented 6 years ago

There is no deadlock, my program checks for that a runtime, i have done some more testing and it seams that the command system works but i wont send any commands because no event are fired, so the bug must be in the event notification, i also dont see the debug messages in the console if i switch the channel for example, i will provide a thread bump as soon as possible

Starmixcraft commented 6 years ago

https://hastebin.com/ovilesetaj.bash is a thread dump at the time the bug accured

Starmixcraft commented 6 years ago

@rogermb I just tryed to run a command in this (Quarry timeout state) and this is the result:

https://hastebin.com/nadoridebi.makefile

rogermb commented 6 years ago

That... kinda explains everything. If you get kicked from the virtual server (e.g. because the virtual server is shut down / restarted), you'd also no longer receive any event notifications from that virtual server.

I've also looked through the thread dump you've sent me, and from what I can tell, everything seems to be working, too. The only threads that are blocked in our API are 1) the sender thread waiting for commands to send, 2) the receiver thread waiting for responses from the server, and 3) the keep-alive thread waiting for the timer to expire. In other words: everything is as it should be 😃

The real question now is why you're getting kicked from that virtual server in the first place. It might be a good idea to monitor and log any ClientLeaveEvents that mention your own client ID.

Alternatively, you could also periodically call TS3Api#whoAmI to check what virtual server you're currently on.

Strangely, however, the log entry of the whoami command you included in your first message says that you're still on the virtual server with ID 1 / port 9987. But if that were the case, you obviously wouldn't be getting the invalid serverID error message when trying to send a command. Did you copy that whoami command log from an earlier point in time when the query was still working?

Starmixcraft commented 6 years ago

@rogermb i just chcked the logs and the whoami command still works after the error, i dont understand wy?

rogermb commented 6 years ago

The whoami command works even if you're not on a virtual server, but the output should be different than what you've included in your first message. In particular, it should mention that you're now on the virtual server with the ID 0, which isn't actually a valid virtual server.

Starmixcraft commented 6 years ago

@rogermb, okay just check logs again, and you were correct,

[01:23:56] [DEBUG] [whoami] > whoami  
[01:23:56] [DEBUG] [whoami] < virtualserver_status=unknown virtualserver_unique_identifier virtualserver_port=0 virtualserver_id=0 client_id=0 client_channel_id=0 client_nickname client_database_id=0 client_login_name client_unique_identifier client_origin_server_id=0
[01:23:56] [DEBUG] [whoami] < error id=0 msg=ok` 

Now the question is how to prevent this

rogermb commented 6 years ago

(We could've saved so much time hunting down nonexistent multi-threading issues if we'd had the correct whoami command response from the get-go... 😛)

I think first of all, we need to find out why you're getting kicked from the virtual server in the first place. If you check your old command log again, it would be interesting to know if there are any notifyclientleftview events right before the query stops working. If yes, could you send me the reason IDs and messages?

Starmixcraft commented 6 years ago

@rogermb Only 4 log lines contain notifyclientleftview but non have the client id of the quarry

rogermb commented 6 years ago

Anything else out of the ordinary between the last "valid" command / event and the first whoami with a virtual server ID of 0?

Starmixcraft commented 6 years ago

@rogermb The entire log is https://hastebin.com/tiqujaxawi.makefile i tryed to remove most of the nois(the input log was 23k lains long) but its still pritty long, maby you spot i dident

Starmixcraft commented 6 years ago

@rogermb Maby this is the problem, in my error logging system there is this error:

An Error accured at 01:00:14 on the server CloudServer, a Error walkthrue folowes:
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at com.github.theholywaffle.teamspeak3.SocketReader.run(SocketReader.java:82)
rogermb commented 6 years ago

So in this message your TS3Config doesn't seem to have reconnecting turned on. But in your log, the query clearly attempts to reconnect. What's up with that?

And after you've reconnected, you don't re-run the required commands (logging in, selecting the virtual server, registering events, setting your nickname). That's why you're not on the correct virtual server.

If you have reconnecting turned on, you'll need to move your "startup" code (the setup commands I just mentioned) into the onConnect method of your ConnectionHandler. Please check out this example.

Starmixcraft commented 6 years ago

I have now set up a custom reconnect handler that dose select a server and that stuff, but now i get this error: https://hastebin.com/uvecidower.makefile

rogermb commented 6 years ago

Could I please see your new code where you set up your TS3Query, including the reconnect handler? Please remove the username & password in the login command, but don't change anything else 😛

Starmixcraft commented 6 years ago

@rogermb This is the new code i am usin: https://hastebin.com/ujipeduyat.java

rogermb commented 6 years ago

The exception will disappear if you use the synchronous API instead of the async API in the onConnect method. This will not block your application, as that handler is run in a thread from our thread pool.

I'm also not sure why you didn't follow the example I sent you earlier. You can create your own ReconnectingConnectionHandler and set all the parameters manually, but that's ugly as sin. Why not use something like ts3Config.setReconnectStrategy(ReconnectStrategy.exponentialBackoff()) instead?

By the way, what you recreated by instantiating your own ReconnectingConnectionHandler is a reconnect strategy that first waits 50 milliseconds before attempting to reconnect. After that, it just tries to reconnect in an infinite loop without ever waiting, because you set the multiplier to 0. These sorts of mistakes is why ReconnectStrategy exists. I think what you were looking for was ReconnectStrategy.constantBackoff(50).

But even that is too little delay in my opinion. Do you really think that if there's a connection problem, you should try to reconnect 20 times per second?

Starmixcraft commented 6 years ago

@rogermb if i use ReconnectStrategy.constantBackoff(50) it will do everything for me? like connect back to the server and register all events?

rogermb commented 6 years ago

No, the reconnect strategy isn't clairvoyant 😛

Just use something like

// One attempt to reconnect every second is more than enough
config.setReconnectStrategy(ReconnectStrategy.constantBackoff(1000));

config.setConnectionHandler(new ConnectionHandler() {

    @Override
    public void onConnect(TS3Query ts3Query) {
        TS3Api ts3Api = ts3Query.getApi();
        ts3Api.login(cloudConfig.getTs3user(), cloudConfig.getTs3pass());
        ts3Api.selectVirtualServerByPort(9987);
        ts3Api.registerAllEvents();
        ts3Api.setNickname("CloudSystem");
    }

    @Override
    public void onDisconnect(TS3Query ts3Query) {
        // Nothing
    }
});
rogermb commented 6 years ago

Oh, actually, don't add your TS3 listener in the onConnect method. You only ever need to do that once, else you'll get duplicate events after reconnects. So just do that after you've called ts3Query.connect().

Starmixcraft commented 6 years ago

Ok, thx for the good support, i probably never had figured that out on my own

Starmixcraft commented 6 years ago

@rogermb I have now used youre code example for 10 hours and it works fine, but my question is wy is the socket timing out, is there a way to fix that, if not is there a esey way to mute the error that gets spammed in my error log?

rogermb commented 6 years ago

If you just connect to localhost / 127.0.0.1, you shouldn't be getting disconnected unless you're restarting your TS3 server.

I think I messed up the condition whether a socket timeout actually means that the connection has been lost here. There's a short amount of time where the receive queue can contain a command but where the writer's idle time has not been reset yet.

You'll notice in your log that your connection always "times out" at the exact moment when a keep-alive command has been sent but when the response hasn't been received yet.

As a quick fix, you could set the the command timeout to something like 62000 (> 60 seconds) until I get this issue fixed.

Regarding the log messages, I think they'll be much less of a concern once connection losses aren't as frequent as you're experiencing right now. Moreover, if you're using an exponential backoff between reconnect attempts, the frequency of these messages diminishes as time goes on. But anyway, those are just workarounds. The actual solution would be to change the logging level of that logger in the configuration file of your logging provider.

The actual-actual solution is for me to fix this timeout bug.

rogermb commented 5 years ago

Should be fixed in v1.2.0.

Thank you very much for reporting this issue! 😄