Closed jwilling closed 4 years ago
Hi @jwilling. I'm not sure what is going on here. It sounds like you have tried everything I would recommend.
Here is how it works:
The script starts a tail
process (line 1595 in the queryStart method) that pipes anything that ends up in the query.in
file to the Query server via socat
. Responses from the Query server via socat
are piped into the query.out
file. Other methods then call the method querySendPacket that packs together a Query packet and writes it to the query.in
file and reads and unpacks the response from the query.out
file. If the tail
or socat
process have issues for some reason, the Query server will be listed as offline.
Perhaps you have a permission issue on one of the two in/out files? I don't know.
Hey @jwilling,
Sorry that you're having issues. Hopefully we can try to resolve it ASAP. A few questions...
Are you having issues using mscs status
with only the wookies
world? Or does it occur to other worlds as well? Perhaps try creating another world and letting it run for an hour, and then see if the issue also occurs when you try to get the status
of that world; this could help tell us if it's world-specific or if there's something else going on. The second thing I would like to check is whether or not the amount of users online has something affecting it; did you find that mscs status
works when, say, there's less than 3 users online and then stops working when there's more than that? Perhaps the issue centers around how many users are online.
Also, are you running MSCS in a multi-user environment / what user(s) are you using to running MSCS? (That is, if you didn't follow the default installation). It could be a permission issue as @sandain noted
Sorry that we can't give you a clear cut answer; right now we have to isolate the problem and then figure it out at this point.
Huge thanks @sandain and @Roflicide for your assistance.
So my original use case for needing the status
command to function was for a status page. I'd gather the player data from the query-json
command and pipe it into a static page for checking who was online. This job ran every minute. I'm sure there are better ways to do this, but I just wanted something quick and dirty.
Since I was having issues with the reliability of the variants of status
, I switched to mcstatus to get the query information I needed. Interestingly, ever since I switched to using mcstatus, the status
query in mscs
functions as expected.
It definitely could be something related to user permissions, although as far as I can remember I ran everything from the same user. I definitely did do a custom installation, as I wanted everything contained within my home directory, in addition to having a different username. And yes, it only occurred with one specific world.
If this ever happens again I'll try to figure out what variables might be at play. That being said, it seems like I'm the only one that had this problem and my use cases are fairly specific, so we could close this and I could reopen it if it occurs later, if you'd prefer.
I had this issue too, but everytime this occured I ran a second query and it came back normal so I didn't evaluate what goes wrong. This happend more often on modded servers running with Forge, vanilla-server querys where as long as I remember fine. If this occures again I will report back.
If you notice, the code in the queryStart
method that I linked to above includes a 20 second delay before starting the Query handler code (that uses tail
and socat
). This means that for the first 20 seconds after a server starts up, the expected response is "query server offline". If you are still getting that response after 20 seconds infrequently there is probably no issue -- if the querySendPacket
method fails to decode a response from the server, it won't try again unless you ask it and this happening infrequently should be no problem. If this is happening more often than infrequently, then there is an issue.
We could make the querySendPacket
method more robust to hickups by adding in a short delay followed by resending the packet when a response packet from the server isn't decoded properly if everyone thinks it is necessary.
@conrad784 thank you for the input, you are right when it seems to be happening to forge servers.
@sandain I have confirmed that this seems to be a bug on forge servers. To reproduce:
1. Create a new forge server
2. Start it
3. Run a mscs status
command
The first time you run the status, it will list the query server as offline. The second time you run it, the command works and it appears to correctly query it.
This issue seems to occur every time after stopping/starting the forge world, and it seems to fix itself after the first query.
EDIT: after further investigation, @sandain seems to be right. This issue only occurs within the first 20 seconds of server startup, as far as i can tell.
I'm going to go ahead and mark this bug as invalid and close it. If anyone is able to come up with a condition where the "query server offline" message is received after the initial 20 second startup time, please reopen this bug.
Having this issue as well, even after the first 20 seconds. What do you need from me to figure this out?
@Aersaud
Are you running a vanilla server or with mods? What version of minecraft? Does it happen to all of your worlds? Did you follow the default installation process? Is there any output in the console?
Running with mods, forge and sponge. Recommended version for both of those. Also a few other mods, if you like I can name them. However, when testing this I ran the server modless, a side from forge and sponge. We only have 1 world so I cannot check any others unfortunately. Followed the default install, including server backups. When this happens the server works fine and there are no errors in either the linux console or the game console. It also ends up resolving and fixing itself after some time (I've seen about 5-10 minutes) sometimes it never resolves and I have to keep starting and stopping the server for it to eventually work. I am checking it using the mscs status [world] command.
Please post a copy of your latest.log
, and the query.in
and query.out
files while you are experiencing this issue. I'll see if I can track it down.
Ok so while my server was running the query server went offline. I cp'ing the files now to SFTP them over here and post. @sandain latest.log queries.zip
Both query in and out are in the queries.zip Virus Total in case any one is worried: https://www.virustotal.com/#/file/49408761d51af5493245a7b132b8fce0adb0eb8b8c4f77b54fba533dd4e09586/detection
Well that didn't really help. The latest.log
shows that the query server is running and I see no error messages that would note the server is having issues with query.
[10:16:35] [Query Listener #1/INFO]: Query running on 0.0.0.0:25565
I am curious what this error message means:
java.io.EOFException: null
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:268) ~[?:1.8.0_162]
at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:258) ~[?:1.8.0_162]
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:164) ~[?:1.8.0_162]
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79) ~[?:1.8.0_162]
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91) ~[?:1.8.0_162]
at net.minecraft.nbt.CompressedStreamTools.func_74796_a(CompressedStreamTools.java:26) ~[gi.class:?]
at net.minecraft.world.storage.SaveFormatOld.loadAndFix(SaveFormatOld.java:117) [bfc.class:?]
at net.minecraft.world.storage.SaveHandler.redirect$onLoadWorldInfo$bbi000(SaveHandler.java:1068) [bfb.class:?]
at net.minecraft.world.storage.SaveHandler.func_75757_d(SaveHandler.java:122) [bfb.class:?]
at org.spongepowered.common.world.WorldManager.loadAllWorlds(WorldManager.java:706) [WorldManager.class:1.12.2-2611-8.0.0-BETA-3018]
at net.minecraft.server.MinecraftServer.func_71247_a(MinecraftServer.java:3530) [MinecraftServer.class:?]
at net.minecraft.server.dedicated.DedicatedServer.func_71197_b(DedicatedServer.java:270) [nz.class:?]
at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:484) [MinecraftServer.class:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_162]
It doesn't look like it should break query, but maybe you should look into what is causing that.
The query.in
file looks correct, but the query.out
file is empty meaning commands were being sent but no response was received for some reason. Check which processes are currently running when you get this problem (run ps -A
). You should have a couple tail
processes and a socat
process running -- tail
is used to watch for changes to the query.in
file and pipe it to socat
, and socat
sends the query to the server and pipes the response to the query.out
file.
mcadmin@tkmminecraft:~$ ps -A
PID TTY TIME CMD
1 ? 00:00:03 systemd
2 ? 00:00:00 kthreadd
3 ? 00:00:00 ksoftirqd/0
5 ? 00:00:00 kworker/0:0H
7 ? 00:00:21 rcu_sched
8 ? 00:00:00 rcu_bh
9 ? 00:00:00 migration/0
10 ? 00:00:01 watchdog/0
11 ? 00:00:01 watchdog/1
12 ? 00:00:00 migration/1
13 ? 00:00:01 ksoftirqd/1
15 ? 00:00:00 kworker/1:0H
16 ? 00:00:01 watchdog/2
17 ? 00:00:00 migration/2
18 ? 00:00:00 ksoftirqd/2
20 ? 00:00:00 kworker/2:0H
21 ? 00:00:01 watchdog/3
22 ? 00:00:00 migration/3
23 ? 00:00:18 ksoftirqd/3
25 ? 00:00:00 kworker/3:0H
26 ? 00:00:00 kdevtmpfs
27 ? 00:00:00 netns
28 ? 00:00:00 perf
29 ? 00:00:00 khungtaskd
30 ? 00:00:00 writeback
31 ? 00:00:00 ksmd
32 ? 00:00:03 khugepaged
33 ? 00:00:00 crypto
34 ? 00:00:00 kintegrityd
35 ? 00:00:00 bioset
36 ? 00:00:00 kblockd
37 ? 00:00:00 ata_sff
38 ? 00:00:00 md
39 ? 00:00:00 devfreq_wq
44 ? 00:00:00 kswapd0
45 ? 00:00:00 vmstat
46 ? 00:00:00 fsnotify_mark
47 ? 00:00:00 ecryptfs-kthrea
63 ? 00:00:00 kthrotld
64 ? 00:00:00 acpi_thermal_pm
65 ? 00:00:00 bioset
66 ? 00:00:00 bioset
67 ? 00:00:00 bioset
68 ? 00:00:00 bioset
69 ? 00:00:00 bioset
70 ? 00:00:00 bioset
71 ? 00:00:00 bioset
72 ? 00:00:00 bioset
73 ? 00:00:00 scsi_eh_0
74 ? 00:00:00 scsi_tmf_0
75 ? 00:00:00 scsi_eh_1
76 ? 00:00:00 scsi_tmf_1
82 ? 00:00:00 ipv6_addrconf
95 ? 00:00:00 deferwq
96 ? 00:00:00 charger_manager
97 ? 00:00:00 bioset
151 ? 00:00:00 kpsmoused
153 ? 00:00:00 scsi_eh_2
155 ? 00:00:00 scsi_tmf_2
192 ? 00:00:00 kworker/0:1H
194 ? 00:00:00 bioset
269 ? 00:00:00 raid5wq
293 ? 00:00:00 kdmflush
295 ? 00:00:00 bioset
304 ? 00:00:00 kdmflush
305 ? 00:00:00 bioset
317 ? 00:00:00 bioset
342 ? 00:00:07 jbd2/dm-0-8
343 ? 00:00:00 ext4-rsv-conver
400 ? 00:00:00 kworker/2:1H
408 ? 00:00:00 iscsi_eh
413 ? 00:00:00 systemd-journal
420 ? 00:00:00 ib_addr
422 ? 00:00:00 kauditd
433 ? 00:00:00 ib_mcast
435 ? 00:00:00 ib_nl_sa_wq
437 ? 00:00:00 ib_cm
441 ? 00:00:00 iw_cm_wq
442 ? 00:00:00 rdma_cm
445 ? 00:00:00 lvmetad
526 ? 00:00:03 kworker/1:1H
580 ? 00:00:00 iprt-VBoxWQueue
730 ? 00:00:00 kworker/3:1H
788 ? 00:00:00 ttm_swap
796 ? 00:00:00 ext4-rsv-conver
936 ? 00:00:00 rsyslogd
938 ? 00:00:00 lxcfs
940 ? 00:00:00 atd
942 ? 00:00:00 cron
946 ? 00:00:00 acpid
954 ? 00:00:00 dbus-daemon
975 ? 00:00:00 systemd-logind
979 ? 00:00:10 snapd
981 ? 00:00:03 accounts-daemon
992 ? 00:00:00 mdadm
1000 ? 00:00:00 polkitd
1079 ? 00:00:00 dhclient
1139 ? 00:00:00 sshd
1163 ? 00:00:07 iscsid
1164 ? 00:00:32 iscsid
1217 ? 00:00:07 irqbalance
1227 tty1 00:00:00 agetty
1528 ? 00:00:00 systemd
1531 ? 00:00:00 (sd-pam)
5579 ? 00:00:21 kworker/0:2
9818 ? 00:00:00 sh
9820 ? 00:00:02 tail
9821 ? 00:00:00 sh
9822 ? 00:33:56 java
10130 ? 00:00:01 kworker/3:0
10709 ? 00:00:11 kworker/0:0
11687 ? 00:00:00 kworker/2:1
13008 ? 00:00:00 kworker/u8:1
13012 ? 00:00:00 kworker/u8:0
13020 ? 00:00:00 kworker/u8:2
13064 ? 00:00:00 kworker/3:1
15560 ? 00:00:00 kworker/1:2
15897 ? 00:00:00 kworker/2:2
15898 ? 00:00:00 kworker/2:3
15906 ? 00:00:00 sshd
15937 ? 00:00:00 sshd
15938 pts/0 00:00:00 bash
17062 ? 00:00:00 systemd-timesyn
17064 ? 00:00:00 kworker/0:1
19820 ? 00:00:00 systemd-udevd
20357 ? 00:00:00 xfsalloc
20358 ? 00:00:00 xfs_mru_cache
20363 ? 00:00:00 jfsIO
20364 ? 00:00:00 jfsCommit
20365 ? 00:00:00 jfsCommit
20366 ? 00:00:00 jfsCommit
20367 ? 00:00:00 jfsCommit
20368 ? 00:00:00 jfsSync
20895 ? 00:00:04 kworker/1:1
21819 ? 00:00:00 kworker/2:0
25742 ? 00:00:02 kworker/1:0
28013 pts/0 00:00:00 ps
I see a tail process, but not seeing socat.
It's happening again by the way.
mcadmin@tkmminecraft:~$ mscs status TKM-MadCraft
Minecraft Server Status:
TKM-MadCraft: running (query server offline).
Process ID: 9822.
Memory used: 1389316 kB.
Permissions:
mcadmin@tkmminecraft:/opt/mscs/worlds/TKM-MadCraft$ ls -la
-rw-r--r-- 1 minecraft minecraft 44 Apr 14 13:18 query.in
-rw-r--r-- 1 minecraft minecraft 0 Apr 14 13:18 query.out
@sandain
Also noticing this:
[02:19:23] [Server thread/ERROR]: Exception reading ./TKM-MadCraft/DIM1/level.dat
java.io.EOFException: null
at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:268) ~[?:1.8.0_162]
at java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:258) ~[?:1.8.0_162]
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:164) ~[?:1.8.0_162]
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:79) ~[?:1.8.0_162]
at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:91) ~[?:1.8.0_162]
at net.minecraft.nbt.CompressedStreamTools.func_74796_a(CompressedStreamTools.java:26) ~[gi.class:?]
at net.minecraft.world.storage.SaveFormatOld.loadAndFix(SaveFormatOld.java:117) [bfc.class:?]
at net.minecraft.world.storage.SaveHandler.redirect$onLoadWorldInfo$bbi000(SaveHandler.java:1068) [bfb.class:?]
at net.minecraft.world.storage.SaveHandler.func_75757_d(SaveHandler.java:122) [bfb.class:?]
at org.spongepowered.common.world.WorldManager.loadAllWorlds(WorldManager.java:706) [WorldManager.class:1.12.2-2611-7.1.0-BETA-3019]
at net.minecraft.server.MinecraftServer.func_71247_a(MinecraftServer.java:3530) [MinecraftServer.class:?]
at net.minecraft.server.dedicated.DedicatedServer.func_71197_b(DedicatedServer.java:270) [nz.class:?]
at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:484) [MinecraftServer.class:?]
-rw-r--r-- 1 minecraft minecraft 0 Apr 11 02:35 level.dat
You only have one tail
not two, and no socat
. That is definitely part of the problem. The question is why that is the case.
I noticed that error also. You might want to look into what is causing that, it is not the script, but might be behind the issue with query.
I found the error from the Sponge github. Apparently its a corrupted level.dat - I wonder if this may be causing it. @sandain
UPDATE!
Removed the corrupt level.dat cp'd level.dat_old to level.dat, started server. No more exception reading error and server is currently showing query working.
Will let it run for a while and check back on it.
Ok just another update. Left it running all night and it did the same thing. Query server went offline randomly.
Hi again @Aersaud,
I'm sorry you're having an issue. I suspect what is going on is that socat's default timeout (0.5 s) is too low for certain server environments; however, this is just a guess. Let's see what exactly is going on. If you could, please do the following:
which msctl
to get the location of the script file. In my case it was /usr/local/bin/msctl
Edit line 1602 of that file, and replace
$SOCAT -
with
$SOCAT -d -d -d -lf \"$WORLD_DIR/socat.log\" -
So now lines 1599
-1603
should look like this:
nohup sh -c "
sleep 20;
tail -f --pid=$SERVER_PID \"$WORLD_DIR/query.in\" |
$SOCAT -d -d -d -lf \"$WORLD_DIR/socat.log\" - UDP:$SERVER_IP:$QUERY_PORT > \"$WORLD_DIR/query.out\"
" > /dev/null 2>&1 &
socat.log
in the world directory (/opt/mscs/worlds/<yourworld>
). This will log all of the information on what socat is doing to that file.Once you've followed the above steps, please execute the status queries as you normally would. Once the issue comes, paste the contents of the socat.log
file here so we can see why exactly socat is terminating.
Thanks!
Thanks for the help @Roflicide, I wasn't sure how to debug this issue further.
Just applied your change to the msctl file. Same location as yours. Server is starting now will monitor and post when it happens. @sandain @Roflicide
This is from the latest time it went down.
2018/04/21 23:22:42 socat[4349] I This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit. $
2018/04/21 23:22:42 socat[4349] I This product includes software written by Tim Hudson (tjh@cryptsoft.com)
2018/04/21 23:22:42 socat[4349] N reading from and writing to stdio
2018/04/21 23:22:42 socat[4349] N opening connection to AF=2 127.0.0.1:25566
2018/04/21 23:22:42 socat[4349] I starting connect loop
2018/04/21 23:22:42 socat[4349] I socket(2, 2, 17) -> 6
2018/04/21 23:22:42 socat[4349] N successfully connected from local address AF=2 127.0.0.1:57655
2018/04/21 23:22:42 socat[4349] I resolved and opened all sock addresses
2018/04/21 23:22:42 socat[4349] N starting data transfer loop with FDs [0,1] and [6,6]
2018/04/21 23:23:12 socat[4349] I transferred 11 bytes from 0 to 6
2018/04/21 23:23:12 socat[4349] E read(6, 0x84c100, 8192): Connection refused
2018/04/21 23:23:12 socat[4349] N exit(1)
2018/04/21 23:23:12 socat[4349] I shutdown(6, 2)
mcadmin@tkmminecraft:/opt/mscs/worlds$ mscs status
Minecraft Server Status:
TKM-MadCraft: running (query server offline).
Process ID: 2421.
Memory used: 1584004 kB.
For some reason it looks like it's opening a connection to port 25566
even though your query server seems to be running on port 25565
based on the latest.log
you uploaded earlier. This would explain why it's failing to read a connection.
Can you paste the contents of your server.properties
file here? Specifically, I want to see what the values are (or if the values are even there!) for the properties query.port
, query-enabled
, and server-port
.
Yes sorry forgot that I changed a few things. It is now running on port 25566. Here is the updated server.properties.
#Minecraft server properties
#Sat Apr 21 23:23:09 EDT 2018
spawn-protection=16
max-tick-time=60000
query.port=25566
generator-settings=
force-gamemode=false
allow-nether=true
gamemode=0
broadcast-console-to-ops=true
enable-query=true
player-idle-timeout=0
difficulty=2
spawn-monsters=true
op-permission-level=4
pvp=true
snooper-enabled=true
level-type=DEFAULT
hardcore=false
enable-command-block=false
max-players=25
network-compression-threshold=256
resource-pack-sha1=
max-world-size=29999984
rcon.port=25575
server-port=25566
server-ip=
spawn-npcs=true
allow-flight=false
level-name=TKM-MadCraft
view-distance=10
resource-pack=
spawn-animals=true
white-list=false
rcon.password=CHANGEDFORPRIVACY
generate-structures=true
max-build-height=256
online-mode=false
level-seed=
prevent-proxy-connections=false
use-native-transport=true
motd="Welcome to TKM-MadCraft\!"
enable-rcon=true
@Roflicide
This was set before the error.
@Aersaud So before the error the query.port
and server.port
were both set to 25566? Or were they set to something else?
Let's try another modification to the script in the same place. Add the -s
option to socat to make it 'sloppy with errors'.
This should make lines 1599-1603 look like:
nohup sh -c "
sleep 20;
tail -f --pid=$SERVER_PID \"$WORLD_DIR/query.in\" |
$SOCAT -d -d -d -lf \"$WORLD_DIR/socat.log\" -s - UDP:$SERVER_IP:$QUERY_PORT > \"$WORLD_DIR/query.out\"
" > /dev/null 2>&1 &
A more complicated option would be to explore the -W
option. This would require wrapping the code to start the query handler in a while
loop that checks for the existence of the lockfile declared with the -W
option and the current state of the server. If the server is running, but the lockfile is missing, the query handler would be restarted. This option requires some coding, so lets give the simple 'sloppy' option above a try first.
Just updated my msctl file using nano. It is now displaying
nohup sh -c "
sleep 20;
tail -f --pid=$SERVER_PID \"$WORLD_DIR/query.in\" |
$SOCAT -d -d -d -lf \"$WORLD_DIR/socat.log\" -s - UDP:$SERVER_IP:$QUERY_PORT > \"$WORLD_DIR/query.out\"
" > /dev/null 2>&1 &
Whats next?
Restart your server with that change in place and let it run until query ends up failing (or not). Post the socat.log
if it fails.
2018/04/22 21:46:48 socat[15897] N opening connection to AF=2 127.0.0.1:25566
2018/04/22 21:46:48 socat[15897] I starting connect loop
2018/04/22 21:46:48 socat[15897] I socket(2, 2, 17) -> 6
2018/04/22 21:46:48 socat[15897] N successfully connected from local address AF=2 127.0.0.1:58939
2018/04/22 21:46:48 socat[15897] I resolved and opened all sock addresses
2018/04/22 21:46:48 socat[15897] N starting data transfer loop with FDs [0,1] and [6,6]
2018/04/22 21:47:05 socat[15897] I transferred 11 bytes from 0 to 6
2018/04/22 21:47:05 socat[15897] E read(6, 0x1db8100, 8192): Connection refused
2018/04/22 21:47:05 socat[15897] N socket 2 to socket 1 is in error
2018/04/22 21:47:05 socat[15897] N socket 2 (fd 6) is at EOF
2018/04/22 21:47:05 socat[15897] I poll timed out (no data within 0.500000 seconds)
2018/04/22 21:47:05 socat[15897] I shutdown(6, 2)
2018/04/22 21:47:05 socat[15897] N exiting with status 0
@sandain
Latest from the socat.log when I ran mscs status
We might need to explore the -W option that I spoke of to restart the Query handler when it goes down. This is assuming that the Query server starts working again after refusing the connection that caused the Query handler to go down. If the server is going down and not coming back up, there is nothing we can do.
Please check if the Query server comes back up after going down with the following tool: https://dinnerbone.com/minecraft/tools/status/
Well it came back up on it's own without me doing anything currently......I will check that website out to query it when it tells me it is down.
Here is the last bit of it going down and up..
I check the status using mscs status every so often to make sure.
2018/04/22 22:54:36 socat[20356] N opening connection to AF=2 127.0.0.1:25566
2018/04/22 22:54:36 socat[20356] I starting connect loop
2018/04/22 22:54:36 socat[20356] I socket(2, 2, 17) -> 6
2018/04/22 22:54:36 socat[20356] N successfully connected from local address AF=2 127.0.0.1:60974
2018/04/22 22:54:36 socat[20356] I resolved and opened all sock addresses
2018/04/22 22:54:36 socat[20356] N starting data transfer loop with FDs [0,1] and [6,6]
2018/04/22 22:55:41 socat[20356] N socket 1 (fd 0) is at EOF
2018/04/22 22:55:41 socat[20356] I shutdown(6, 1)
2018/04/22 22:55:42 socat[20356] I poll timed out (no data within 0.500000 seconds)
2018/04/22 22:55:42 socat[20356] I shutdown(6, 2)
2018/04/22 22:55:42 socat[20356] N exiting with status 0
2018/04/22 22:57:40 socat[22210] I socat by Gerhard Rieger - see www.dest-unreach.org
2018/04/22 22:57:40 socat[22210] I This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit. (http://www.openssl.o$
2018/04/22 22:57:40 socat[22210] I This product includes software written by Tim Hudson (tjh@cryptsoft.com)
2018/04/22 22:57:40 socat[22210] N reading from and writing to stdio
2018/04/22 22:57:40 socat[22210] N opening connection to AF=2 127.0.0.1:25566
2018/04/22 22:57:40 socat[22210] I starting connect loop
2018/04/22 22:57:40 socat[22210] I socket(2, 2, 17) -> 6
2018/04/22 22:57:40 socat[22210] N successfully connected from local address AF=2 127.0.0.1:48128
2018/04/22 22:57:40 socat[22210] I resolved and opened all sock addresses
2018/04/22 22:57:40 socat[22210] N starting data transfer loop with FDs [0,1] and [6,6]
2018/04/23 14:59:08 socat[22210] I transferred 11 bytes from 0 to 6
2018/04/23 14:59:08 socat[22210] I transferred 13 bytes from 6 to 1
2018/04/23 14:59:09 socat[22210] I transferred 15 bytes from 0 to 6
2018/04/23 14:59:09 socat[22210] I transferred 195 bytes from 6 to 1
2018/04/23 16:12:06 socat[22210] I transferred 11 bytes from 0 to 6
2018/04/23 16:12:06 socat[22210] I transferred 13 bytes from 6 to 1
2018/04/23 16:12:07 socat[22210] I transferred 15 bytes from 0 to 6
2018/04/23 16:12:07 socat[22210] I transferred 202 bytes from 6 to 1
2018/04/23 16:27:01 socat[22210] I transferred 11 bytes from 0 to 6
2018/04/23 16:27:01 socat[22210] I transferred 14 bytes from 6 to 1
2018/04/23 16:27:02 socat[22210] I transferred 15 bytes from 0 to 6
2018/04/23 16:27:02 socat[22210] I transferred 202 bytes from 6 to 1
2018/04/23 16:27:10 socat[22210] I transferred 11 bytes from 0 to 6
2018/04/23 16:27:10 socat[22210] I transferred 14 bytes from 6 to 1
2018/04/23 16:27:11 socat[22210] I transferred 15 bytes from 0 to 6
2018/04/23 16:27:11 socat[22210] I transferred 202 bytes from 6 to 1
2018/04/23 16:28:59 socat[22210] I transferred 11 bytes from 0 to 6
2018/04/23 16:28:59 socat[22210] I transferred 13 bytes from 6 to 1
2018/04/23 16:29:00 socat[22210] I transferred 15 bytes from 0 to 6
2018/04/23 16:29:00 socat[22210] I transferred 202 bytes from 6 to 1
2018/04/23 16:29:51 socat[22210] I transferred 11 bytes from 0 to 6
2018/04/23 16:29:51 socat[22210] I transferred 14 bytes from 6 to 1
2018/04/23 16:29:52 socat[22210] I transferred 15 bytes from 0 to 6
2018/04/23 16:29:52 socat[22210] I transferred 202 bytes from 6 to 1
I have a feeling the 2018/04/22 22:55:42 socat[20356] I poll timed out (no data within 0.500000 seconds)
kind of points to our issue as you described earlier where it is too short of a time frame.
@sandain
I figured out how to replicate this issue on my machine--
Start three worlds (either all three at the same time or very quickly within one another). Then run mscs status
as quickly as possible. Wait a few minutes (for the query servers to start up) and run mscs status
again.
Based on my testing, with three worlds started all at the same time using mscs start
, the first world will correctly be queried, but the last two will display running (query server offline)
, even after waiting a significant amount of time for the servers to start up before querying again (like more than 5 minutes). This issue seems to arise whenever I query multiple worlds using mscs status
that have been started relatively recently (like within 20 seconds of the server being started up).
I checked the query.in
files, and they are all sending the same challenge packet--its just that the latter two aren't getting a response, whereas the first one is getting a response.
When i tested this with two worlds (mscs start
then mscs status
right after, then waiting a few minutes before running mscs status
again) one world showed the query data whereas the second one didnt (it said running (query server offline)
).
When i tested this with one world I had no issues.
I don't think its necessarily a problem with the querying portion, because (as said above) it looks like all the initial queries are being built the same. i think it has something to do with this bit of code thats handling the negotiation between the query.in
and query.out
files:
# Start a tail process to watch for changes to the query.in file to pipe
# to the Minecraft query server via socat. The response from the query
# server is piped into the query.out file. Give the server a moment to
# start before initializing the Query handler.
nohup sh -c "
sleep 20;
tail -f --pid=$SERVER_PID \"$WORLD_DIR/query.in\" |
$SOCAT - UDP:$SERVER_IP:$QUERY_PORT > \"$WORLD_DIR/query.out\"
" >/dev/null 2>&1 &
# Verify the connection to the query server worked.
if [ $? -ne 0 ]; then
printf "Error connecting to the query server.\n"
fi
but i'm not entirely sure, it's just a guess.
@sandain let me know if you can replicate this using the steps i outlined above
I just tested using your method, and it works as expected for me. The first query fails because of the built-in delay (see the sleep 20;
line in your code snippet above), but every subsequent query after that timer kicks off works as expected.
I think with PR #242 by @Roflicide, we can finally put this issue to rest. Feel free to reopen this issue if you are still affected.
Hey! First of all, thanks to all of the contributors for the work done on this project. It's been extremely helpful and has saved me quite a lot of time.
I'm having an issue with the status commands through mscs. My server has two worlds running on non-standard ports. They both work just fine, and can be queried using an online tool with the expected response. When I try to query using
mscs status
, however, I tend to get this after the server has been running for about an hour (with a couple users online):Any idea what's causing this?