Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Hashes are not thread safe and there are no means of dealing with this #387

Open vrurg opened 1 year ago

vrurg commented 1 year ago

The known fact: hashes are not concurrent safe. And this is OK from the performance side of it. What is not OK is that debugging concurrency problems related to conflicting read/writes is tedious; mostly it's a game of trial and error and involves a lot of guessing.

Most advises related to the problem are winding down either to using atomic ops or locks. These are good when hash access can be encapsulated into methods. But there is at least one case in Rakudo itself where none of the advises is applicable: stashes. I have also very strong reasons to believe that there is similar problem with Unicode properties table too.

vrurg commented 1 year ago

What I would like to be considered in first place is a feature that would allow to temporarily turn on lock protection for hash representation object. The protection must not prevent concurrent access to the structure but instead must cause the backend VM die and dump stack traces of all involved threads if a read/write access requested while another write operation is in progress.

This would provide a great deal of information necessary to track down the most "popular" race conditions in code. Any code, not only the core.

Unfortunately, this might not be the best solution when it comes down to stashes. First of all, not all of them are instances of the Stash class. Some of them are coming from NQP where they are bare BOOTHash.

Aside of that, some core code like symbol merging operate directly on Stash underlying storage object – which is a BOOTHash again. That code may not be aware of any locks imposed by higher level code.

Another solution is required for this and, perhaps, similar cases. I think it would make sense to be able to lock/unlock the hash representation in some way so any concurrent access to it is blocked without the backend dying. For example, it can be something like nqp::dispatch('boot-syscall', 'boothash-lock', $stash), nqp::dispatch('boot-syscall', 'boothash-unlock', $stash).

Apparently, there is a performance penalty attached. But stashes are not that often accessed, already read symbols can be cached (that's a good practice anyway), and other measures can be takes for the safety sake. Also, though it's rather intuitive guess on my side, reading from an unlocked hash shouldn't get significantly slower and this is the most common mode of operation for a stash.

vrurg commented 1 year ago

Rakudo issues that can (could) be resolved with the above proposal:

Perhaps (and even likely) that there more related bug reports, but that's what I've found so far.

duncand commented 1 year ago

I want to throw an idea out there, and I recognize in saying this that I am quite poorly informed in how Raku is implemented, so I may be making unfounded assumptions.

I have the impression that, like Perl before it, Raku essentially uses a common Hash implementation for what I see as 2 very different use cases, those being low-cardinality cases representing symbol tables or similar things, and high-cardinality cases representing regular user data.

I feel that a lot of fundamental efficiencies could be gained if these 2 use cases were treated at arms length and be handled fundamentally differently by specialized types that don't try to share an implementation. How concurrency is implemented can also differ between the two.

Maybe the implementation is already separate for the two cases, but some things I read suggested otherwise.

Just a thought.

raiph commented 1 year ago

FTR, two related SO posts with answers by jnthn: