SpectralSequences / sseq

The root repository for the SpectralSequences project.
Apache License 2.0
22 stars 10 forks source link

CI sometimes hangs #142

Closed JoeyBF closed 5 months ago

JoeyBF commented 8 months ago

See for example the CI pass for 3299940, which timed out. It seems to be a deadlock related to iter_s_t, but we haven't changed it in a while. Maybe some dependency introduced it recently.

Edit: It looks like that one was on my fork, but the CI for #141 is currently hanging

JoeyBF commented 8 months ago

I found the bug. I'm not sure why the issue never popped up before, but I guess it might be a recent update to rayon. Here's how the deadlock happens.

I think the way out is using reentrant mutexes for OnceVec. I'll experiment with that

JoeyBF commented 8 months ago

Turns out that reentrant mutexes won't work. By design they can't give us a mutable reference, because then a thread locking the same lock twice would have two mutable references.

The only option that I see, short of implementing some sort of prioritization of tasks that would be internal to rayon (which would also take care of #105), would be revising the implementation of OnceVec so that it becomes lock-free.