Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.99k stars 559 forks source link

Why are threads discouraged? #14691

Open p5pRT opened 9 years ago

p5pRT commented 9 years ago

Migrated from rt.perl.org#125106 (status was 'open')

Searchable as RT125106$

p5pRT commented 6 years ago

From @dur-randir

On Tue\, 10 Apr 2018 04​:04​:06 -0700\, davem wrote​:

The problem is in people's expectations. When most part of something is thread-safe\, they'll just assume "everything is thread-safe" and then'll be hit very hard. One of the examples I like is Mouse\, which was thread-unsafe until the year 2015. While it's not as popular as Moose\, it's still a module with large a large user base. And your chances to encounter bugs in not so popular XS modules are much\, much higher. What's even worse\, you're likely to hit them only under some unlucky circumstances (see example below).

As for the core\, I agree that it's now much more thread-stable than before. Somewhere around 5.12 or 5.14 I've prepared a talk named "no threads" with some nice crash examples in it - they crash no more. But still\, locale handling was thread-unsafe until 5.24 (or 5.26?) - just because of no one has discovered that. And the following still fails loudly (though not dumps a core);

while (1) {   push @​foo\, threads->create(sub {   require IO​::Handle;   });   $_->detach for(splice @​foo); }

So while yes\, perl is much more thread-safe inherently as it used to be\, I won't recommend using threads in it to anyone.

p5pRT commented 6 years ago

From @iabyn

On Tue\, Apr 10\, 2018 at 02​:59​:50PM +0200\, Christian Walde wrote​:

On Tue\, 10 Apr 2018 13​:03​:45 +0200\, Dave Mitchell \davem@​iabyn\.com wrote​:

I really fail to see how CPAN is different in this regard.

Because this is Perl where not using CPAN is not an option and CPAN library consumers rely on CPAN libraries being almost entirely rock-solid\, or at least the author having easy and quick ways to fix bugs.

You still haven't differentiated perl+CPAN from some_other_language + 3rd_party_libraries_needed_to_get_the_job_done.

If using a programming language other than perl\, then it is likely that​:

1) not using 3rd-party libraries is not an option; 2) consumers of 3rd-party libraries rely on those libraries being almost   entirely rock-solid\, or at least the author having easy and quick ways   to fix bugs.

If you really look at all 3 points i made\, in aggregate\, and don't see how this is a problem and a danger\, then i don't think i can come up with other word combinations to make you see it.

Can you come up with a hypothetical scenario\, e.g. a multi-threaded program that uses libraries to connect to a database and retrieve and parse some XML data. Then got through it step by step so that I can see why using perl and CPAN is dangerous\, but using (e.g.) java and a DB and XML library is safe? What is the crucial difference between the two that flips it from being safe to unsafe?

-- If life gives you lemons\, you'll probably develop a citric acid allergy.

p5pRT commented 6 years ago

From tom@binary.com

On 10 April 2018 at 21​:59\, Dave Mitchell \davem@​iabyn\.com wrote​:

On Tue\, Apr 10\, 2018 at 02​:59​:50PM +0200\, Christian Walde wrote​:

If you really look at all 3 points i made\, in aggregate\, and don't see how this is a problem and a danger\, then i don't think i can come up with other word combinations to make you see it.

Can you come up with a hypothetical scenario\, e.g. a multi-threaded program that uses libraries to connect to a database and retrieve and parse some XML data. Then got through it step by step so that I can see why using perl and CPAN is dangerous\, but using (e.g.) java and a DB and XML library is safe? What is the crucial difference between the two that flips it from being safe to unsafe?

Here's one trivial hacked-together example of code which I would argue leads to "unexpected" results\, at least from the perspective of a C or Java programmer experienced in the ways of threads​:

#!/usr/bin/perl use strict; use warnings;

use threads; use threads​::shared;

my %items : shared; my $lockvar : shared;

my $t = threads->create(sub {   lock $lockvar;   cond_wait($lockvar);   my @​data : shared = qw(example content here);   $items{0 + \@​data} = \@​data; }); my $seen_item; while(1) {   do {   lock $lockvar;   cond_broadcast($lockvar);   } until keys %items;   print "Item with address " . $_ . " actually refers to " . (0 + $items{0 + $_}) . "\n" for keys %items;   last if defined $seen_item and $seen_item ne (values %items)[0];   ($seen_item) = values %items; } $t->join;

=pod

Sample output (5.26.1)​:

  Item with address 31700632 actually refers to 31360376   Item with address 31700632 actually refers to 30235168

=cut

p5pRT commented 6 years ago

From @iabyn

On Tue\, Apr 10\, 2018 at 08​:40​:45PM +0800\, Tom Molesworth via perl5-porters wrote​:

The official threads.pm documentation claims that the current code has what I'd class as "bugs"​:

    Even with the latest version of Perl\, it is known that certain
    constructs with threads may result in warning messages concerning
    leaked scalars or unreferenced scalars\. However\, such warnings are
    harmless\, and may safely be ignored\.

If they are leaking\, this has bad implications for long-running code. If they're not leaking\, why the warning? If this information is out of date\, then the bugs are in the documentation!

That text is 11 years old and should be removed.

This has dangerous assumptions in it​:

- Thread safety is not a binary on/off switch. Modules can appear to be thread-safe under some conditions and not be so under others. That's why i called it a heisenbug in my previous email.

Indeed\, this is arguably worse than clear "the code crashed so it doesn't work" cases.

Any threaded program in any programming language is susceptible to subtle heisenbugs. I'd argue that perl is less susceptible than many languages due to its 'not shared be default' nature.

We also don't have good documentation on how to make modules thread-safe​: I've encountered quite a few Perl developers who are confident that thread safety is only an issue with XS.

I'd be one of those\, in the sense that the perl language itself is thread-safe\, and doesn't normally need locks or special handling; but in any environment where there are multiple concurrently running threads\, there will be some extra considerations required.

As an example​: far as I recall\, the threads.pm documention and perlmod page to which they link never explicitly state that ref addresses will be different in each thread​:

Since the whole basic principle of ithreads is that data isn't shared by default\, it would be astonishing if two (non-shared) refs in two different threads had the same address.

I know much of my CPAN code will be susceptible to this type of issue.

Do you think there are many such issues\, or is ref addresses the main one?

Any time someone asks about thread safety\, I point to the "discouraged" line in the official documentation and explain that I don't expect my code to work when multiple threads are active - "maybe try again if we have a new threads implementation in the future".

Fine - add to the pod in your modules that they're not thread-safe.

- Windows support​: not a mistake.

Yet I suspect many complaints about bad threading behaviour in perl (such as ref addresses changing in a sub-thread) applies just as much to the Windows fork emulation - e.g. do a pseudo-fork on Windows and all the ref addresses change. I don't see you we can praise one and condemn the other.

Exposing the API to end users without warnings and clear information about how threads differs from other languages\, on the other hand - maybe not ideal.

I am all in favour of having\, at the same location as the current 'discouraged' text\, but instead of it\, a big flashing neon sign saying that perl threads are a bit different from what you might expect and only use them if you understand that (e.g. each thread is a non-shared clone of the parent\, with memory and start-up-cost implications).

Discouraged features aren't currently candidates for removal\, but we may

later deprecate them if they're found to stand in the way of a significant improvement to the Perl core.

- surely this is the case? If we come up with a better way to implement threads while retaining the ability for Windows to support some form of process emulation and related features\, wouldn't this be something that's welcomed? I don't get the impression that there's widespread satisfaction with the current state of affairs on either threads.pm or threads​::shared.

Even if we ended up doing Windows fork() differently\, I wouldn't (at the moment) advocate deprecating or removing threads.pm threads​::shared\, or the underlying core implementation.

-- "I do not resent criticism\, even when\, for the sake of emphasis\, it parts for the time with reality".   -- Winston Churchill\, House of Commons\, 22nd Jan 1941.

p5pRT commented 6 years ago

From Eirik-Berg.Hanssen@allverden.no

On Tue\, Apr 10\, 2018 at 4​:27 PM\, Dave Mitchell \davem@​iabyn\.com wrote​:

On Tue\, Apr 10\, 2018 at 08​:40​:45PM +0800\, Tom Molesworth via perl5-porters wrote​:

We also don't have good documentation on how to make modules thread-safe​: I've encountered quite a few Perl developers who are confident that thread safety is only an issue with XS.

I'd be one of those\, in the sense that the perl language itself is thread-safe\, and doesn't normally need locks or special handling; but in any environment where there are multiple concurrently running threads\, there will be some extra considerations required.

  And yet\, documented behaviour of pure-Perl modules breaks under threads.

  Because of this​:

As an example​: far as I recall\, the threads.pm documention and perlmod page to which they link never explicitly state that ref addresses will be different in each thread​:

Since the whole basic principle of ithreads is that data isn't shared by default\, it would be astonishing if two (non-shared) refs in two different threads had the same address.

  (And quite a few pure-Perl modules do not expect this. I don't have a count of how common it is on CPAN though.)

Any time someone asks about thread safety\, I point to the "discouraged" line in the official documentation and explain that I don't expect my code to work when multiple threads are active - "maybe try again if we have a new threads implementation in the future".

Fine - add to the pod in your modules that they're not thread-safe.

  How about we make thread support opt-in instead of opt-out?

Exposing the API to end users without warnings and clear information about how threads differs from other languages\, on the other hand - maybe not ideal.

I am all in favour of having\, at the same location as the current 'discouraged' text\, but instead of it\, a big flashing neon sign saying that perl threads are a bit different from what you might expect and only use them if you understand that (e.g. each thread is a non-shared clone of the parent\, with memory and start-up-cost implications).

  That is\, add a warning that threads are unsupported by CPAN modules unless such support is advertised?

Eirik

p5pRT commented 6 years ago

From @iabyn

On Tue\, Apr 10\, 2018 at 06​:42​:44AM -0700\, Sergey Aleynikov via RT wrote​:

As for the core\, I agree that it's now much more thread-stable than before. Somewhere around 5.12 or 5.14 I've prepared a talk named "no threads" with some nice crash examples in it - they crash no more. But still\, locale handling was thread-unsafe until 5.24 (or 5.26?) - just because of no one has discovered that. And the following still fails loudly (though not dumps a core);

while (1) { push @​foo\, threads->create(sub { require IO​::Handle; }); $_->detach for(splice @​foo); }

Oh\, that's fun\, Looks like _create_getline_subs in IO.xs is directly modifying the global PL_check[] rather than via the official API which does the necessary locking.

So while yes\, perl is much more thread-safe inherently as it used to be\, I won't recommend using threads in it to anyone.

Perl has bugs. Perl's threading has bugs. I haven't seen any particular evidence yet that the number of threaded bugs is disproportionately large on recent perls.

-- The Enterprise successfully ferries an alien VIP from one place to another without serious incident.   -- Things That Never Happen in "Star Trek" #7

p5pRT commented 6 years ago

From @rcaputo

On Apr 10\, 2018\, at 10​:27\, Dave Mitchell \davem@​iabyn\.com wrote​:

I am all in favour of having\, at the same location as the current 'discouraged' text\, but instead of it\, a big flashing neon sign saying that perl threads are a bit different from what you might expect and only use them if you understand that (e.g. each thread is a non-shared clone of the parent\, with memory and start-up-cost implications).

Also that you will mostly be on your own navigating the differences\, as Perl's users on average are reluctant at best to help you with them and will often "actively encourage" you to seek other implementations.

-- Rocco Caputo \rcaputo@​pobox\.com

p5pRT commented 6 years ago

From @khwilliamson

On 04/10/2018 07​:42 AM\, Sergey Aleynikov via RT wrote​:

On Tue\, 10 Apr 2018 04​:04​:06 -0700\, davem wrote​:

The problem is in people's expectations. When most part of something is thread-safe\, they'll just assume "everything is thread-safe" and then'll be hit very hard. One of the examples I like is Mouse\, which was thread-unsafe until the year 2015. While it's not as popular as Moose\, it's still a module with large a large user base. And your chances to encounter bugs in not so popular XS modules are much\, much higher. What's even worse\, you're likely to hit them only under some unlucky circumstances (see example below).

As for the core\, I agree that it's now much more thread-stable than before. Somewhere around 5.12 or 5.14 I've prepared a talk named "no threads" with some nice crash examples in it - they crash no more. But still\, locale handling was thread-unsafe until 5.24 (or 5.26?) - just because of no one has discovered that.

Actually threaded pure perl programs are still unsafe in 5.26. perl switches the locale behind your back\, even if you follow our admonitions to not explicitly use locales. I'm to blame for some of these\, in my earlier naivete\, and some have been there for a long time. Sergey found one case that I added in 5.24 I believe\, and contributed a test file to verify it's still fixed

5.28 uses thread safe locales if available on the system. On other systems\, I avoid switching locales\, and added a mutex for those cases where switching is still done.

When perl starts up\, it reads the environment to see what each locale category should be set to. If not all categories are set to the same thing\, this would cause perl to potentially switch locales to gather information about the outliers. This is potentially problematic on unsafe threaded builds. I solved this by gathering the information at start up\, and caching it. This is one of the ways 5.28 avoids switching locales.

In researching this\, I looked in the POSIX standard for functions it allows to be non-thread safe. I have a WIP to add cautions about these to XS writers. I also noticed that the Linux man pages indicated that they have failed to implement correctly some that are supposed to be thread-safe. Other systems may implement these safely\, but not others.

I searched the perl source code for instances of these calls. And then manually started to examine them to see if there was a problem. I have not finished (and of course I may make mistakes in my analysis). The glaring case where there is a problem is in accessing the environment (getenv() et.al.) These need to be protected by a mutex\, but it's only a problem if another thread is changing the environment at the same time\, a much less common occurrence. My guess is that these aren't crashing things because most environment changes would tend to be done at start-up\, even before thread creation. Things you might not expect to\, without thinking about it\, like tzset()\, do access the environment\, and there is a potential race if the environment is changed by another thread during tzset's execution. And tzset is called from places that at first glance you wouldn't expect.

  And the following still fails loudly (though nott dumps a core);

while (1) { push @​foo\, threads->create(sub { require IO​::Handle; }); $_->detach for(splice @​foo); }

So while yes\, perl is much more thread-safe inherently as it used to be\, I won't recommend using threads in it to anyone.

--- via perlbug​: queue​: perl5 status​: open https://rt-archive.perl.org/perl5/Ticket/Display.html?id=125106

p5pRT commented 6 years ago

From @iabyn

On Tue\, Apr 10\, 2018 at 10​:26​:21PM +0800\, Tom Molesworth via perl5-porters wrote​:

On 10 April 2018 at 21​:59\, Dave Mitchell \davem@​iabyn\.com wrote​:

Can you come up with a hypothetical scenario\, e.g. a multi-threaded program that uses libraries to connect to a database and retrieve and parse some XML data. Then got through it step by step so that I can see why using perl and CPAN is dangerous\, but using (e.g.) java and a DB and XML library is safe? What is the crucial difference between the two that flips it from being safe to unsafe?

Here's one trivial hacked-together example of code which I would argue leads to "unexpected" results\, at least from the perspective of a C or Java programmer experienced in the ways of threads​:

That isn't a reply to the asked question\, which was trying to eke out an understanding of why Christian believes the CPAN ecosystem in some way makes threading more hazardous on perl than in other languages (which it may\, but I can't grasp the point he's trying to make).

The example code you've given seems to be (at a quick glance)\, just the fact (again) that refs in different threads have different addresses in perl.

Yes of course this will be confusing to anyone who expects the perl threading model to be the same as C or Java's. And yes I agree that the docs should explain the model early on to manage expectations.

-- This is a great day for France!   -- Nixon at Charles De Gaulle's funeral

p5pRT commented 6 years ago

From @iabyn

On Tue\, Apr 10\, 2018 at 04​:46​:13PM +0200\, Eirik Berg Hanssen wrote​:

That is\, add a warning that threads are unsupported by CPAN modules unless such support is advertised?

I'd be happy with that.

-- Lear​: Dost thou call me fool\, boy? Fool​: All thy other titles thou hast given away; that thou wast born with.

p5pRT commented 6 years ago

From @wchristian

On Tue\, 10 Apr 2018 15​:59​:53 +0200\, Dave Mitchell \davem@​iabyn\.com wrote​:

why using perl and CPAN is dangerous\, but using (e.g.) java and a DB and XML library is safe? What is the crucial difference between the two that flips it from being safe to unsafe?

Very short version​: Java has a much more bigger userbase\, money invested\, and things like a database module or an xml parser under java will be written under the assumption of being required to work with threads\, as Java uses threads like skittles.

Also in Java both of these are in the standard library\, not 3rd party.

Additionally\, to give you a more concrete example of a simple Perl threads fuck-up​:

In 2011 i attempted to write a simple parallel http downloader\, using LWP and some thread queue manager module. It worked 100% fine under windows\, no matter what i did. On Fedora it seemed to work fine\, but crashed randomly when attempting to handle too many downloads.

This would be completely inconceivable in Java.

-- With regards\, Christian Walde

p5pRT commented 6 years ago

From tom@binary.com

On 10 April 2018 at 23​:11\, Dave Mitchell \davem@​iabyn\.com wrote​:

On Tue\, Apr 10\, 2018 at 10​:26​:21PM +0800\, Tom Molesworth via perl5-porters wrote​:

On 10 April 2018 at 21​:59\, Dave Mitchell \davem@​iabyn\.com wrote​:

Can you come up with a hypothetical scenario\, e.g. a multi-threaded program that uses libraries to connect to a database and retrieve and parse some XML data. Then got through it step by step so that I can see why using perl and CPAN is dangerous\, but using (e.g.) java and a DB and XML library is safe? What is the crucial difference between the two that flips it from being safe to unsafe?

Here's one trivial hacked-together example of code which I would argue leads to "unexpected" results\, at least from the perspective of a C or Java programmer experienced in the ways of threads​:

That isn't a reply to the asked question\, which was trying to eke out an understanding of why Christian believes the CPAN ecosystem in some way makes threading more hazardous on perl than in other languages (which it may\, but I can't grasp the point he's trying to make).

The example code you've given seems to be (at a quick glance)\, just the fact (again) that refs in different threads have different addresses in perl.

Related but slightly different​: this time it's 2 different addresses *within the same thread*\, none of which match the refaddr in the other thread. For something that's supposedly "shared"\, I'd call having 3 addresses surprising at best!

What I'm trying to demonstrate is it's easy to hit implementation details in Perl\, ones that go against the common understanding of "threads". I don't think people should have to read Shared.xs before they can implement a multithreaded XML+DB application. Documentation might help\, but there's an uphill struggle at the first word​: what we call "threads"\, other languages might call a nearly-unrecognisable special case!

(Eirik and Christian have already addressed the other comments I was going to make\, so I'll stop here - sorry for derailing the thread)

p5pRT commented 6 years ago

From @Leont

On Tue\, Apr 10\, 2018 at 2​:40 PM\, Tom Molesworth via perl5-porters \perl5\-porters@​perl\.org wrote​:

Discouraged features aren't currently candidates for removal\, but we may later deprecate them if they're found to stand in the way of a significant improvement to the Perl core.

- surely this is the case? If we come up with a better way to implement threads while retaining the ability for Windows to support some form of process emulation and related features\, wouldn't this be something that's welcomed? I don't get the impression that there's widespread satisfaction with the current state of affairs on either threads.pm or threads​::shared.

A crucial thing to understand in this idea is that ithreads and threads.pm are not the same thing. ithreads is a C-level feature in the implementation\, threads.pm and pseudoforks are end-user level features that are built upon ithreads.

One can build other abstractions on top of them. ithreads is not a good abstraction for anything resembling shared-memory architectures\, but does fit a number of other concurrency models. threads​::lite did proof that concept I hope.

If that's not what one wants\, it's probably possible to implement GIL threading instead (though I'm not sure if we'd really want that either). I think that's unexplored territory though.

Leon

p5pRT commented 6 years ago

From @xenu

On Tue\, 10 Apr 2018\, at 16​:46\, Eirik Berg Hanssen wrote​:

How about we make thread support opt-in instead of opt-out?

It's already the case. By default perl builds without ithreads.

p5pRT commented 6 years ago

From Eirik-Berg.Hanssen@allverden.no

On Wed\, Apr 11\, 2018 at 12​:33 AM\, Tomasz Konojacki \me@​xenu\.pl wrote​:

On Tue\, 10 Apr 2018\, at 16​:46\, Eirik Berg Hanssen wrote​:

How about we make thread support opt-in instead of opt-out?

It's already the case. By default perl builds without ithreads.

  Context​:

Fine - add to the pod in your modules that they're not thread-safe.

How about we make thread support opt-in instead of opt-out?

  That is\, I was referring to thread support in CPAN modules. I'm suggesting we document\, in big fat (blinking marquee etc ...)​:

  Unless a module explicitly states it supports threads\, and you use it with threads\, you're on your own and get to keep both pieces when it breaks.

Eirik

p5pRT commented 6 years ago

From @xenu

On Wed\, 11 Apr 2018\, at 00​:44\, Eirik Berg Hanssen wrote​:

  Context​: 

Fine - add to the pod in your modules that they're not thread-safe.

  How about we make thread support opt-in instead of opt-out?

  That is\, I was referring to thread support in CPAN modules.  I'm suggesting we document\, in big fat (blinking marquee etc ...)​:

  Unless a module explicitly states it supports threads\, and you use it with threads\, you're on your own and get to keep both pieces when it breaks.

Eirik

Oh\, I see\, I missed the last sentence of your message. Sorry for the noise.

p5pRT commented 6 years ago

From @bulk88

On Tue\, 10 Apr 2018 15​:25​:00 -0700\, LeonT wrote​:

If that's not what one wants\, it's probably possible to implement GIL threading instead (though I'm not sure if we'd really want that either). I think that's unexplored territory though.

Wouldnt one of the existing fibers/future/promises/Coro/async-but-not-async-with-explicit-yields modules that swap Perl stacks inside the same interp/same perl thread be perl's already-implemented GIL concept?

-- bulk88 ~ bulk88 at hotmail.com

p5pRT commented 6 years ago

From @Leont

On Wed\, Apr 11\, 2018 at 8​:45 AM\, bulk88 via RT \perlbug\-followup@​perl\.org wrote​:

On Tue\, 10 Apr 2018 15​:25​:00 -0700\, LeonT wrote​:

If that's not what one wants\, it's probably possible to implement GIL threading instead (though I'm not sure if we'd really want that either). I think that's unexplored territory though.

Wouldnt one of the existing fibers/future/promises/Coro/async-but-not-async-with-explicit-yields modules that swap Perl stacks inside the same interp/same perl thread be perl's already-implemented GIL concept?

AFAIK those are all "one interpreter one os-thread"\, wheras GIL threading is "one interpreter many os-threads"

Leon

MaxPerl commented 2 years ago

Please remove the discourage warning. The threads implementation has limitations, of course. But they are explained into the deep, so that everybody can decide whether or not to use threads. At the moment the warning frightens developers to use and try threads (I was frightened and begin only after a long time to study threads). But the threads implementation only can better in Perl if it is used and bugs can be found.

A problem indeed is that many modules doesn't support threads. The perlmod docs explain a simple mechanism for making a module threadsafe by adding CLONE_SKIP to the module. I found it helpful to make modules not used in the worker threads threadsafe at runtime with following BEGIN block:

BEGIN {
    *PackageNameNotUsedInThreads::CLONE_SKIP = sub {print "no cloning \n"; 1;}
}

Perhaps this is a useful trick for others and could be documented, too?

Thanks in advance and best wishes, Max