OpenSmalltalk / opensmalltalk-vm

Cross-platform virtual machine for Squeak, Pharo, Cuis, and Newspeak.
http://opensmalltalk.org/
Other
558 stars 111 forks source link

Error message for the failure to create heartbeat thread is too specific. #540

Open eliotmiranda opened 3 years ago

eliotmiranda commented 3 years ago

Hi,

The warning/error

pthread_setschedparam failed: Not owner

is also happening on Solaris, and obviously it also prints some text, which does not really make sense on Solaris.

I have documented on the following Wiki how to get rid of this warning/error on Solaris:

    https://sourceforge.net/p/solaris-squeak/wiki/Home/

I don't know how to fix the issue specifically on your Ubuntu distribution, but possibly it makes sense to - when the error happens - to print a simple:

"error: pthread_setschedparam failed: Not owner

or

"error: heartbeat thread unable to run at higher priority"

Such a simple short message would be sufficient, and then the user can try to do some research and find fixes for the specific platform, where the error happens.

I can imagine that on various Unix flavors the instructions for:

/etc/security/limits.d/squeak.conf

do not apply.

Also note that in my limited experience on Solaris, I can either fix the warning (by having threads run at different priorities) or ignore the warning, as it does not seem to make much difference ...

But it's not nice that an extensive error message is printed, certainly not when it does not apply to that specific Unix distribution.

Regards, David Stes

dcstes commented 3 years ago

Hi everyone,

It's good that there is an open issue for this issue.

Also note that there are 2 different things here : (1) the issue itself which is that pthread_setschedparam failed: Not owner and (2) how to solve this

For the message, it would be sufficient to just report (1) what happens.

Now the action that one should take to resolve this, depends on the operating system, and also to some extent I think there may be different approaches.

For example, I have recently discovered that an additional necessary privilege must be given at the Solaris zone level :

example# zonecfg -z myzone zonecfg:myzone> set limitpriv="default,proc_priocntl" zonecfg:myzone2> exit

To be able to add "privs=proc_priocntl" in a zone.

One possible approach to address the issue itself is by creating a profile:

profiles -p Squeak

profiles:Squeak> set desc="Allow Squeak VM to Set Priority" profiles:Squeak> add cmd=/usr/bin/squeak profiles:Squeak:squeak> set privs=proc_priocntl profiles:Squeak:squeak> end profiles:Squeak> commit profiles:Squeak> info name=Squeak desc=Allow Squeak VM to Set Priority cmd=/usr/bin/squeak profiles:Squeak> exit

However bottom line is there is a difference between the issue itself, and the many different ways to address the issue on the various platforms and distributions of operating systems.

Regards, David Stes

eliotmiranda commented 3 years ago

Hi David, would it be practicable to create a page on the wiki here in this repository that lists the solutions fir the various platforms and then have the error message in the vm reference that wiki page?

dcstes commented 3 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

Currently if I go to

http://wiki.squeak.org/squeak/search

and I type in the "Search" field : pthread_setschedparam

There is "No Match".

The recent changes to the Wiki

http://wiki.squeak.org/squeak/recent

seem to be related mostly to programming in Squeak.

But there are also wiki pages on building the VM software, so a document on the wiki about the

"pthread_setschedparam failed: Not owner ... See https://github.com/OpenSmalltalk/opensmalltalk-vm/releases/tag/r3732#linux"

issue could be a good idea on wiki.squeak.org.

David Stes

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iQEcBAEBCAAGBQJfz7akAAoJEAwpOKXMq1MaSZcIALmMv1/jOYadGS+2ghb7ePyS r0w3GH/7bPfQvWdn41b95pP9otYoXhQWbIPYy7gwbWoYu+MfeU7CeuUHra7H986u OWcA0ffN2a6yBoenl6ovvz/FoojZm5ZfbN848OYN9nRgdW84Uq2KVmvd2t5itHrv L2SjJLma2am84eC8wBRG0G4bjjGtaM5erx/JkQWz+1J+WPdC/X//NpMu+emf+z27 K3EseCeV6Lj1Gu5Ks8GK60MnLzR6gsvuHJfnrHMEvZtjQQ9JA70S+bDFx2bLpjbt Yoq8CmOTe74EtC/gEjVyrZ/KGuJRMMA5JsUmFq47CsqKIytBoHvqrnFfiy+1xdQ= =yvka -----END PGP SIGNATURE-----

dcstes commented 3 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

Another thing to consider is whether calling pthread_setschedparam() should be the default ...

My impression is that if I just ignore the warning

"pthread_setschedparam failed: Not owner"

the programs still run fine.

So perhaps the default could be changed NOT to issue a pthread_setschedparam() and make it optional.

Maybe some option like '-best-performance' could enable this.

Currently the message seems to indicate it is for "best operation".

So the message implies that the default could be not to issue the pthread_setschedparam() and then have the user it enable via a switch.

In any case, the message is very specific and it combines both error and solution, while the solution is "cookbookish" it tells you how what to do, but there are many variations possible and many operating systems, what could be the right "cookbook recipe" on one system may not apply to another ...

MESSAGE:

pthread_setschedparam failed: Not owner This VM uses a separate heartbeat thread to update its internal clock and handle events. For best operation, this thread should run at a higher priority, however the VM was unable to change the priority. The effect is that heavily loaded systems may experience some latency issues. If this occurs, please create the appropriate configuration file in /etc/security/limits.d/ as shown below:

cat <<END | sudo tee /etc/security/limits.d/squeak.conf

and report to the squeak mailing list whether this improves behaviour.

You will need to log out and log back in for the limits to take effect. For more information please see https://github.com/OpenSmalltalk/opensmalltalk-vm/releases/tag/r3732#linux

David Stes

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iQEcBAEBCAAGBQJf0hMyAAoJEAwpOKXMq1MaazkH/R8KAROHBOUlcaXTZ8MPPr6v fi8/zjr8eLtAKK7GxlVPRO9EI1Zpxs7z2/YpjlkuID1gepTgMvKH7HlJaiytw1SV iKW7Ybp8KAAIo/7uBTeH1/2woGNDZDELaApQGIfD7WPoCMKEJv2C0Gt/QosXBJ0o xk+M4D2P3JhkjP1VshgzYzfHWwec7SlABUQgQP++iT7zwVSp0CzUTSuKtG78T4yB DKNUiHcFIJA9zmKTiGWgdsabPww8mlZ1+yDWPsSd5sqq5YunhSjy2ixkQ3+T+HbC 2hMzD02Ef9z2OAutmjPuWN06GC6XhB8vGhEmk6Pgl1QmVkgDlrqdT4ZVk2gUMEs= =2wOl -----END PGP SIGNATURE-----

dcstes commented 3 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

By the way, there seems to be an alternative to running the Squeak VM in a profile which raises the privilege temporarily.

It also seems possible to permanently raise the privilege, for example, in a zone by the /etc/security/policy.conf file.

grep proc_priocntl /etc/security/policy.conf

 PRIV_DEFAULT=basic,proc_priocntl

By raising the PRIV_DEFAULT to also allow process/light weight processes (threads) to set their own priority (which is not by default allowed), it is possible to get rid of the pthread_setschedparam() warning.

Now permanently allowing this for all threads is best limited, in a zone (a jail) inside the operating system, by setting the limit of the zone to:

zonecfg -z newt info | grep limitpriv

limitpriv: default,proc_priocntl

David Stes

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iQEcBAEBCAAGBQJf0nyPAAoJEAwpOKXMq1MaJoAIAJQwLdN/oTJTPvg4QwNdemJl dOgB5GGlu4q8dHTb0G2lubcIXHSuQbtOOAIMsY7H3nxvYbjX91jJbeLnOp71RpGu xq5tL8MWuTVP8ocEs+UOFkyMb+emDK9ox0OGynI/HwyWsJmpXaU8/X10NbJiPqGa UCZ0q1CE3MEjcvGTcABIjUZ3TQdQ2Hc0XTbfNF+aZBKW9k76u7Q5/ULmvhajO0gd SAak4I8OfoecaJenqWRK+2oi+R2Ih98DzKGtoVQnaCwmkZJGv0eBQ14+YiuKA8c7 mod4xkR7zyMu6YhaoZ3jmZNmUQ6/LZ2rITSXIHvHs3Rlq7KXKrAyaDPRnw52/Rs= =Ihdi -----END PGP SIGNATURE-----

dcstes commented 3 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

When running the Web Framework 'Aida 6.8' on the Cog VM 5.0, I do notice that there seems to be interest in solving the "heartbeat thread" warning.

If I just ignore the warning about the heartbeat thread priority, small Squeak examples and the Squeak system itself seems to run fine.

However the Aida web framework seems to depend on 'scheduling' issues, and Aida apparently works better if I modify the /etc/security/policy.conf

Or maybe it just appears to work better.

grep proc_priocntl /etc/security/policy.conf

 PRIV_DEFAULT=basic,proc_priocntl

In any case modifying the /etc/security/policy.conf gets rid of the warning.

The warning itself is however phrased in such a way that there's no indication, that you have to modify the /etc/security/policy.conf of course.

My guess is that there's a great many choices and different options, on how to allow a process to set its priority, on various systems.

David Stes

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iQEcBAEBCAAGBQJf3dgMAAoJEAwpOKXMq1Ma9z4IAIDnoXvKZBa/TSNGfxa+DbEr RL5LxPM6aViQWzOiqlBuTeMWDUsieOe8J9ucEkxnEue5zWYNPs9+tfQXY8OqeU5+ hrdAOAG0amQoJVPjMqbO8ZQoQ0p3Vlsx143DJmotUiKQ81EiADcbVGN7SM9Pq3Yd dLl9IaY74eZoaGCkvNQI8tXIyLir5ZdX/3OKGqR4gibD6mKbYweBT0jWusl3eVAs xfrYLZeWAW7/3cpFW3E2DTpy/bv8cwmoO6/vr5ARP581Guj2LQv3nI3B1MZHWen2 SjrDDlih6Z+A5U5ATXT2JFxbZVLyrJ9wpw2sMGHGRUQia6C7jIbH+btBpzDfeP4= =NdSf -----END PGP SIGNATURE-----

eliotmiranda commented 3 years ago

Hi David, you say

"Another thing to consider is whether calling pthread_setschedparam() should be the default ...

My impression is that if I just ignore the warning

"pthread_setschedparam failed: Not owner"

the programs still run fine.

So perhaps the default could be changed NOT to issue a pthread_setschedparam() and make it optional."

This is not the right solution, but it's a little non-obvious.

What the setschedparam call does is raise the priority of the heartbeat thread relative to the main Smalltalk thread. Leaving it out means that the heartbeat thread runs at the same priority as the main Smalltalk thread. The effect of this is that if the main Smalltalk thread runs continuously, for example is in some compute-intensive loop, then the heartbeat thread will be prevented from running, and hence events such as Delay expiry will not be checked for, and the main thread will never be interrupted. Try something like the following with a heartbeat thread running at the same priority as the main thread running on a single core machine and you'll manifest the issue:


p := [| i | i := 0. [(i := i + 1) odd ifTrue: [i := i - 1]] repeat] newProcess.
p forkAt: Processor activePriority - 1.
(Delay forSeconds: 1) wait.
p terminate```

without the heartbeat thread running at a higher priority process p consumes all the processor and the heartbeat thread never gets to run, consequently the Delay forSeconds: 1) wait does not complete and p continues running.
smalltalking commented 3 years ago

The effect of this is that if the main Smalltalk thread runs continuously, for example is in some compute-intensive loop, then the heartbeat thread will be prevented from running, and hence events such as Delay expiry will not be checked for, and the main thread will never be interrupted.

I doubt any modern OS does cooperative scheduling at any priority level. Without the increased priority, the heartbeat will still be scheduled but it may not interrupt the main thread at the intended time. Also, today's machines are almost all multi-core, so even if you keep a core busy with the main thread, another core may schedule the heartbeat process.

OpenSmalltalk-Bot commented 3 years ago

On 2020-12-21, at 3:30 PM, smalltalking notifications@github.com wrote:

The effect of this is that if the main Smalltalk thread runs continuously, for example is in some compute-intensive loop, then the heartbeat thread will be prevented from running, and hence events such as Delay expiry will not be checked for, and the main thread will never be interrupted.

I doubt any modern OS does cooperative scheduling at any priority level. Without the increased priority, the heartbeat will still be scheduled but it may not interrupt the main thread at the intended time. Also, today's machines are almost all multi-core, so even if you keep a core busy with the main thread, another core may schedule the heartbeat process.

Well, yeah, that's really the point - you can't rely upon it. On some machines, with some versions of some OS it may work perfectly. On another day it might not.

The bit that has been bothering me recently with this is that a simple test program to try the priority raise seems to work but the nominally-the-same code in the actual VM tells me the raise failed. I'm not a fan of inconsistency.

tim

tim Rowledge; tim@rowledge.org; http://www.rowledge.org/tim There can never be a computer language in which you cannot write a bad program.

dcstes commented 3 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

Well I can understand that Eliot has good reasons to raise the priority of the heartbeat thread.

Also I have noticed now that I'm running a nice package "AIDA" (web framework on top of Swazoo) that the WebScheduler is sensitive to scheduling of course.

My feeling about it is that still the error message is "too specific".

Perhaps the "configure" script could define something like

HAVE_LINUX_PAM_LIMITS

because I believe the directory /etc/security/limits.d on Linux is related to a pam_limits package.

The platforms/vm/unix/sqUnixHeartbeat.c could then print the message only

ifdef HAVE_LINUX_PAM_LIMITS

Some Linux distros may not even have that directory /etc/security/limits.d.

Regarding Solaris, or other Unix systems that are not Linux, they can have their own system of privileges to allow/disallow a thread or a process to set or raise priority.

See https://en.wikipedia.org/wiki/Pluggable_authentication_module

So although Solaris uses PAM, I could then not define HAVE_LINUX_PAM_LIMITS, and then at the least the error message would be more suitable.

Although the action to remediate the issue would be different, the heartbeat thread could still run at a different priority than the main thread.

David Stes

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iQEcBAEBCAAGBQJf4cFXAAoJEAwpOKXMq1MaH70H+wZMv7KCxiMNKlAINfMgKmfA h/6+uxpqvyA6qncIaB9Ewmj1IIesO18kSomTBHDQFYFNdV0DnippzVpkQ1Q4mvrK U1iSIj8a0qRjNYlkrOFdkDa0hSg2OhV4vHjqp48mGCCV0BovFKr3AoaNkUyiLhM+ qAIX6BsmNNgWeBoWf9hTWyot1uk6XXAhlT2UUyn4wImnHcrsKlJc+9EFK112ZlNQ siZen1aPEZYZD7k2t8/raRVVYDekHbugIFeVcbj+lsAp5GqM1pqJeRnW0t9IipRm Sxg2Rq4GvI+VknpVq3pY5fROHdSiu/0yj4EhEJrI42N11hwk4+Y0pdoD8K6TNF0= =7kpF -----END PGP SIGNATURE-----

dcstes commented 3 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256

On some Linux systems without PAM,

it might be a possibility to use "prlimit" to set rtprio:

https://serverfault.com/questions/889635/seting-rtprio-limit-in-system-without-pam

For example with

prlimit --rtprio=99 --pid=

and then start squeak from that shell.

If the Linux system has no /etc/security directory that's a possibility.

Basically I think there are probably a great many different ways to achieve the goal of suppressing the "pthread_setsched" message.

The current error message prints (1) a problem and (2) a Linux PAM specific cookbookish recipe to solve it.

However the error message could be limited to just (1) print the problem, that's all ... The user has then to figure out for themselves how to solve it.

David Stes

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2

iQEcBAEBCAAGBQJf5KQ2AAoJEAwpOKXMq1MaA8oIAImkmyhtZepUSpbMYGalt2mj SdXKzaFkB3ja31q/coN4LQSajPqL0giUQfXodMG4oD2LzELQfUSZp/MIZxbmR6oF Ja/WZC64TjlgdKBHakDkDEJlHbUVn1ZXTCluyIHYafqZhbeXivlw4mkrrgRYgpHh FVBlliM6qrlH3eFU6X2mtjVzHI1TTNahPyw9skgeNRjThW3S1bnasGYPPTVJIoC0 ja/Ml/YNG08sAmEDVKzYC7G3VIqPCfFEHsGOOhJmXEYyJcUF7FoAfzSKSLZSKs4y iD6N/SME98BBUzzxtBF8wNg4jayqdGmxsCBnfqAAPQRNyDi4TL7O9oDWKaiDel8= =OJYO -----END PGP SIGNATURE-----