Pryaxis / TShock

☕️⚡️TShock provides Terraria servers with server-side characters, anti-cheat, and community management tools.
GNU General Public License v3.0
2.43k stars 379 forks source link

Stuck on "Found Server" (CentOS 6, Mono Stable 4.2.3.4/832de4b) #1172

Closed remi6397 closed 8 years ago

remi6397 commented 8 years ago

I know there are many closed issues on this topic, but I hope this one can finally solve this problem.

When I'm trying to connect to Terraria-TShock server hosted on my Centos x86 PC in Client appears Found Server, and on Server something like X.X.X.X:YYYYY is connecting to slot Z (slot is 0 or 1, when more attempts).

My Mono version:

Mono JIT compiler version 4.2.3 (Stable 4.2.3.4/832de4b sob, 2 kwi 2016, 13:43:53 CEST)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
        TLS:           __thread
        SIGSEGV:       altstack
        Notifications: epoll
        Architecture:  x86
        Disabled:      none
        Misc:          softdebug
        LLVM:          supported, not enabled.
        GC:            sgen

Linux version:

Linux hostname 2.6.32-573.el6.i686 #1 SMP Thu Jul 23 12:37:35 UTC 2015 i686 i686 i386 GNU/

TShock bamboo 585, Terraria 1.3.0.8

I can provide more informations.

PS: Sorry for my bad English, I'm from Poland...

hakusaro commented 8 years ago

Can you update the kernel?

cc @tylerjwatson

Praytock commented 8 years ago

Hey nicatronTg, love your work... I just rebuilt centos 7 server and put latest Tshock release on it with updated mono and screen, im getting same thing. My kernel is updated too. I also completely disabled iptables, firewalld, selinux, ...no change. I also passed the client through my personal computer so no firewall there either.. Kind of odd, the server logs don't tell me anything that the forums say is important. Searched all over the forums and google for anyone else having same prob with centos 7, but it appears this chap here has the closest thing to mine with centos 6.

Ill be honest, im kind of a noob with githum as i just registered today to comment here =)

hakusaro commented 8 years ago

Try downgrading to an earlier Mono version (e.g. 4.1 or 4.0).

Praytock commented 8 years ago

arg... im not linux smart enough.. I tried downgrading for the last 30 minutes... i can't figure out how, No worries, I ran Tshock on my windows server, and its running fine... hate windows, ahh well.

I suck at life =(

Thanks for your help though =)

tylerjwatson commented 8 years ago

It's actually the TSAPI software that has the troubles. We are aware of some issues with Mono, and we have been trying to fix it up but Mono's lacklustre debugging makes it hard.

Praytock commented 8 years ago

Oh wow, good to know. Thank you.. Yes as soon as these get resolved, I will happily move the server back to the centos virtual server. Thanks you are Awesome!

Mstrodl commented 8 years ago

I can confirm this also happens on Mono 4.2.1 on Ubuntu 16.04 on the latest version from the dev branch.

Praytock commented 8 years ago

ahh well, guess i might have to wait a while then =) At least I got it up and running on a nasty windows server.

On Wed, Apr 27, 2016 at 5:19 AM, Mstrodl notifications@github.com wrote:

I can confirm this also happens on Mono 4.2.1 on Ubuntu 16.04 on the latest version from the dev branch.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/NyxStudios/TShock/issues/1172#issuecomment-215064925

tylerjwatson commented 8 years ago

Hi guys,

I believe I have made some headway into figuring out this problem. As the issue appears to be in the unmanaged side of the mono binary, I can only provide a workaround.

The issue appears to be the soft counter mechanism in mono which is refusing to allow new worker threads to be created at a crucial part of the application, causing it to hang.

Could all people affected by the found server issue please try starting mono the following way:

$ MONO_THREADS_PER_CPU=50 mono TerrariaServer.exe

This overrides mono's maximum threadpool threads per processor, and I have had good success making it work on all my affected servers. Please report back if it worked for you.

Cheers!

Mstrodl commented 8 years ago

~~@tylerjwatson , I believe you mean: `$ MONO_THREADS_PER_CPU=50; mono TerrariaServer.exe~~

EDIT: Disregard that, it doesn't work like that. I tried it the way you wrote it and it worked :)

tylerjwatson commented 8 years ago

@Mstrodl nope, I really didn't.

Mstrodl commented 8 years ago

@tylerjwatson I know, that's why I added the edit

letteka commented 8 years ago

@tylerjwatson On FreeBSD, mono v4.2.3, running it with the above command works, but only from within the same directory as TerrariaServer.exe. If I run it from any other directory, the server starts but tshock stuff isn't enabled.

tylerjwatson commented 8 years ago

This is by design, and unrelated to this issue.

Terraria requires the working directory to be the current directory to function properly.

Sent from my Samsung Galaxy smartphone.

-------- Original message -------- From: letteka notifications@github.com Date: 3/05/2016 10:50 AM (GMT+10:00) To: NyxStudios/TShock TShock@noreply.github.com Cc: Tyler Watson tyler@tw.id.au, Mention mention@noreply.github.com Subject: Re: [NyxStudios/TShock] Stuck on "Found Server" (CentOS 6, Mono Stable 4.2.3.4/832de4b) (#1172)

@tylerjwatsonhttps://github.com/tylerjwatson On FreeBSD, mono v4.2.3, running it with the above command works, but only from within the same directory as TerrariaServer.exe. If I run it from any other directory, the server starts but tshock stuff isn't enabled.

You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/NyxStudios/TShock/issues/1172#issuecomment-216406916

nikitakuklev commented 8 years ago

Same issue as above - clients stuck on 'found server' and host cannot enter commands into terminal or Ctrl-C out of it

Running clean Ubuntu 15.10 server 64-bit (KVM VPS) with mono-complete from xamarin repos.

uname -a

Linux -snip- 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

mono -V

Mono JIT compiler version 4.2.3 (Stable 4.2.3.4/832de4b Wed Mar 16 13:19:08 UTC 2016) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: __thread SIGSEGV: altstack Notifications: epoll Architecture: amd64 Disabled: none Misc: softdebug LLVM: supported, not enabled. GC: sgen

One interesting thing I noted is that on very first run, you can generate world, select it and it works fine. However, doing /off and starting up again results in the above bug. Furthermore, I spun up a 16.04 desktop VM locally with exactly same mono version...and there everything works. I even SFTPed the exact folder from desktop->server (both under root) and it failed to run. ¯\ (ツ)

hakusaro commented 8 years ago

@nuclearrussian you made no indication that you tried running with:

$ MONO_THREADS_PER_CPU=50 mono TerrariaServer.exe

Did you?

nikitakuklev commented 8 years ago

@nicatronTg I was getting somewhat conflicting data tbh:

On 15.10 server, 1 brand new worldgen failed on first load and every next one + 2 ok on first load and all others On 16.04 desktop, everything still good For desktop->server, everything worked

As I said...weird, but thread setting seemed to help.

Reading about underlying issues, seems like default mono threadpool has a tendency to deadlock with low default values [grrr concurrency]. Running this gives 1/100 and 3/300 for server and desktop systems (1/100 per cpu). Since MONO_THREADS_PER_CPU is a multiplier of above values, setting it to 50 corresponds to 50/5000 threads per cpu and appears to avoid most deadlocks. The wiki for that project even mentions setting MONO_THREADS_PER_CPU ~ 125.

Guided by above consideration, I restarted desktop VM with only 1 core vs 3 before. Sure enough, first attempted load got stuck. (interestingly, attempting to load same world later even with the parameter fix failed...possible data corruption?).

In summary, $ MONO_THREADS_PER_CPU=50 (or more) appears to be an adequate fix, albeit a non-deterministic one. The corresponding inability to reload world after hang is however concerning.

tylerjwatson commented 8 years ago

I'm a little confused about what you're trying to say. The process does not come to a deadlock because of locking, but rather from ThreadPool exchaustion.

The things you should take away from the article you linked are:

Above this number, Mono will make an internal decision whether to spawn a new thread or wait for an existing thread to complete its task.

It is here that TSAPI will become unstuck. All ThreadPool tasks in TSAPI never complete, as they are all infintely-running jobs incorrectly used by Re-Logic as ThreadPool.QueueUserWorkItem jobs. Mono, running out of ThreadPool threads waits for a free worker thread that never completes.

It would be appropriate to remove these and put them into new Thread()s, however I'm unsure of the regressions it would cause in the server (or rather, how much of the code is reliant on things running one after another in a ThreadPool scenario incorrectly).

From memory, there are about 6 or 7 ThreadPool workers going on all the time. Some are our fault, others not.

Mono 2.10.8.1, for instance, has a default MONO_THREADS_PER_CPU of 1

This is absolutely correct. Your non-determinism assumption is right, as the final number of threads available for ThreadPool consumption is still dependent on the number of physical CPUs in the machine, which is why your results aren't 100% deterministic.

The corresponding inability to reload world after hang is however concerning.

Welcome to .NET on Linux, and a binary whose source is not available for public consumption. Proper lifecycle in mono is a pain at the best of times... .NET has no support for signals, fork(), or anything to make it a well-behaved UNIX daemon. Unfortunately we have to take the good with the bad on this one; it's simply unable to behave as you would expect if things go bad on Linux.

tylerjwatson commented 8 years ago

If you're interested, see mono/metadata/threadpool-ms.c: https://github.com/mono/mono/blob/master/mono/metadata/threadpool-ms.c#L300-L305

And the worker_try_unpark function, which is where the infinite loop happens here: https://github.com/mono/mono/blob/master/mono/metadata/threadpool-ms.c#L552

nikitakuklev commented 8 years ago

@tylerjwatson Thank you for the clarification, interesting read. I was indeed abusing the term deadlock (I tend to rarely encounter any concurrency troubles past basic resource contention in my scientific work, so everything is a deadlock to me hehe).

As for current issues, seems that mono threadpool implementation is just inferior to latest versions of .NET, since latter scales dynamically based on throughput optimization starting from 4.0. I am not sure at to how tshock sits of top of terraria, and whether its possible to implement your own threadpool or trick mono into preinitializing threads...probably too much work anyways for what is realistically an edge case of mono users. Seeing as minimum thread number can't be raised from code, would it be prudent to include some sort of run_mono.sh with

#!/bin/sh
export MONO_THREADS_PER_CPU=50
mono --server TerrariaServer.exe

into next release along with appropriate tutorial modification?

For that matter, a link to setting up mono repos would be good too. Ubuntu 15.10 still has 3.x branch by default...

E: looks like mono also got same updated threadpool in 4.2.1...herp derp...

Zemmi commented 8 years ago

I can confirm this happens on Mono 4.2.3 , Raspbian, Raspberry Pi 2. But it works fine on Mono 3.2.8.
MONO_THREADS_PER_CPU=50 doesn't work.

uname -a

Linux Pi2 4.1.19-v7+ #858 SMP Tue Mar 15 15:56:00 GMT 2016 armv7l GNU/Linux

mono --version

Mono JIT compiler version 4.2.3 (Stable 4.2.3.4/832de4b Wed Mar 16 13:34:50 UTC 2016) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: __thread SIGSEGV: normal Notifications: epoll Architecture: armel,vfp+hard Disabled: none Misc: softdebug LLVM: supported, not enabled. GC: sgen

hakusaro commented 8 years ago

@Zemmi you're likely going to hit this problem more often on smaller hardware configurations. The Pi 2 has a 900MHz ARM Cortex A7 chip, albeit with four cores. I doubt it can seriously compute fast enough to a TShock server reliably. You could try bumping the count up, but you only have 1GB of RAM too.

Zemmi commented 8 years ago

@nicatronTg It's for fun to run tshock on RP2, on my Macbook Pro (2015 8G ram) this also happens.

uname -a

Darwin MacBook 15.4.0 Darwin Kernel Version 15.4.0: Fri Feb 26 22:08:05 PST 2016; root:xnu-3248.40.184~3/RELEASE_X86_64 x86_64

mono --version

Mono JIT compiler version 4.2.3 (explicit/832de4b Thu Mar 3 19:24:57 EST 2016) Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com TLS: normal SIGSEGV: altstack Notification: kqueue Architecture: x86 Disabled: none Misc: softdebug LLVM: yes(3.6.0svn-mono-(detached/a173357) GC: sgen

tylerjwatson commented 8 years ago

@nuclearrussian good point on updating the documentation, I will get around to that later on.

As for current issues, seems that mono threadpool implementation is just inferior to latest versions of .NET, since latter scales dynamically based on throughput optimization starting from 4.0.

You would be forgiven for thinking that, but it has actually only started going pear-shaped since they have pulled in sources from .NET. If you have a look at mono/metadata/threadpool-ms.c, the latest support for thread pooling on the unmanaged side has been ported directly from Microsoft's sources including an algorithms for managing load and other goodies that appear to have caused these regressions.

Post mono-4.0, most of the System namespace has been ported from Microsoft's reference sources, which is where most of my dramas have started, go figure.

@Zemmi I have no idea how the arm port of mono fares, but you may have a lot of trouble running TShock on a system with 1GB total RAM (shared with video).

nikitakuklev commented 8 years ago

@Zemmi Can you try it with a clean tshock folder? Mine kept locking up on any settings if it ever did before, so delete tshock settings/worlds and try clean gen->load

Also...if you are feeling very bored and want to make a monster: RasPi2(armv7) -> QEMU(x86) -> some unix variant(x86) -> wine -> winetricks (dotnet45) -> server running as x86 windows program -> profit (I would be shocked if that actually worked)

Trying out wine-mono and wine-dotnet atm on x86 machine, will update if I get it working

hakusaro commented 8 years ago

@nuclearrussian @Zemmi feel free to discuss alternate configs on the forums, but try to keep this discussion topic to running TShock on Mono in sane (i.e., server hardware) installation environments. Try not to discuss Wine in this thread.

remi6397 commented 8 years ago

@tylerjwatson: The command you provided: $ MONO_THREADS_PER_CPU=50 mono TerrariaServer.exe Just worked for me. Thanks!