fsprojects / Paket

A dependency manager for .NET with support for NuGet packages and Git repositories.
https://fsprojects.github.io/Paket/
MIT License
2.02k stars 520 forks source link

paket.processlock-ed paket sometimes crashes with StackOverflowException #3760

Open vivainio opened 4 years ago

vivainio commented 4 years ago

Description

I am using Paket 5.241.2

Our concurrent build sometimes crashes with StackOverflowException.

From logs:

 Could not acquire lock to C:\p2p\paket-files\paket.processlock.
  Access to the path 'C:\p2p\paket-files\paket.processlock' is denied.
  Trials left: 13.

  Process is terminated due to StackOverflowException.
  Performance:
   - Disk IO: 5 milliseconds
   - Runtime: 2 seconds

Also from logs:

C:\p2p.paket\paket.targets(65,5): error MSB3073: The command ""C:\p2p.paket\paket.exe" restore --references-file "C:\p2p\Src\XXXXXXXXXX\paket.references"" exited with code -1073741571. [C:\p2p\Src\XXXXXX.csproj]

(this is an old non-sdx-style csproj file)

This started happening when I upgraded from 5.176.6 to 5.241.2

forki commented 4 years ago

do you have a repro for me?

vivainio commented 4 years ago

No I don't. It's a complicated concurrent build of our proprietary application, and it only happens sometimes.

jbaehr commented 4 years ago

We're having this issue, too. It happens sporadically, especially when building large solutions in parallel. I observed it with 5.181.1, 5.219.0 and the most recent 5.245.1.

When I attach a debugger when the exception gets thrown, I'm pointed to paket.exe!Paket.Utils.waitForUnlocked@465, e.g. here https://github.com/fsprojects/Paket/blob/master/src/Paket.Core/Common/Utils.fs#L465

I'm no F#-expert, but from what I understand, the recursion blows the stack before the timeout kicks in.

Replacing this with a loop is maybe not that elegant, but more robust. Apparently there is no tail-call optimization.

vivainio commented 4 years ago

@jbaehr interestingly, that function looks perfectly tail recursive.

The real problem is the fact that it gets locked here for extended periods of time. The old Paket (5.176.6) we are still using because of this bug doesn't get locked at all.

forki commented 4 years ago

Please do send us a PR with a loop. Thanks

Ville Vainio notifications@github.com schrieb am Mi., 20. Mai 2020, 20:35:

@jbaehr https://github.com/jbaehr interestingly, that function looks perfectly tail recursive.

The real problem is the fact that it gets locked here for extended periods of time. The old Paket (5.176.6) we are still using because of this bug doesn't get locked at all.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fsprojects/Paket/issues/3760#issuecomment-631650609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAOANELHM4D7BXJ6HDA4N3RSQPHPANCNFSM4J5CYH3A .

jbaehr commented 4 years ago

@forki thanks for the loop-unroll-experiment (cf. version 5.247.3)! Indeed, I don't see a crash any more, now it fails with the expected error message:

... 82> Paket version 5.247.4 82> Performance: 82> - Runtime: 1 second 82> Paket failed with 82> -> Could not acquire lock to '.....\paket-files\paket.processlock'. No more trials left. Message.: The process cannot access the file '.....\paket-files\paket.processlock' because it is being used by another process. 82>......paket\paket.targets(65,5): error MSB3073: The command ""......paket\paket.exe" restore --references-file ".....\SomeProject\paket.references"" exited with code 1. ...

What seems strange though, is the runtime of one second. As I understand the code, it retry 100 times with a sleep of 100 ms in between. That should result in at least 10 seconds before there are no more trails left, right?