dhall-lang / dhall-haskell

Maintainable configuration files
https://dhall-lang.org/
BSD 3-Clause "New" or "Revised" License
912 stars 213 forks source link

Some dhall-lsp-server tests are timing out #2454

Closed sjakobi closed 2 years ago

sjakobi commented 2 years ago

This has recently started happening in CI, starting with https://github.com/dhall-lang/dhall-haskell/actions/runs/3093604714.

I have also reproduced it locally with cabal test dhall-lsp-server:test:tests.

Luckily, on Windows, we get some output on the failures:

``` Completion Dhall.Completion suggests user defined types: FAIL (120.07s) uncaught exception: SessionException Timed out waiting to receive a message from the server. Last message received: { "jsonrpc": "2.0", "params": { "uri": "file:///D:/a/dhall-haskell/dhall-haskell/dhall-lsp-server/tests/fixtures/completion/CustomTypes.dhall", "diagnostics": [ { "severity": 1, "range": { "start": { "line": 2, "character": 34 }, "end": { "line": 2, "character": 35 } }, "source": "Dhall.TypeCheck", "message": "Error: Unbound variable: C" }, { "severity": 4, "range": { "start": { "line": 0, "character": 13 }, "end": { "line": 0, "character": 43 } }, "source": "Dhall.Lint", "message": "Unused let binding 'Config'" } ] }, "method": "textDocument/publishDiagnostics" } Use -p '/suggests user defined types/' to rerun this test only. suggests user defined functions: FAIL (120.06s) uncaught exception: SessionException Timed out waiting to receive a message from the server. Last message received: { "jsonrpc": "2.0", "params": { "uri": "file:///D:/a/dhall-haskell/dhall-haskell/dhall-lsp-server/tests/fixtures/completion/CustomFunctions.dhall", "diagnostics": [ { "severity": 1, "range": { "start": { "line": 7, "character": 0 }, "end": { "line": 7, "character": 0 } }, "source": "Dhall.Parser", "message": "unexpected end of input\nexpecting \"→\", ',', ->, :, ], or whitespace\n" } ] }, "method": "textDocument/publishDiagnostics" } Use -p '/suggests user defined functions/' to rerun this test only. suggests user defined bindings: FAIL (120.06s) uncaught exception: SessionException Timed out waiting to receive a message from the server. Last message received: { "jsonrpc": "2.0", "params": { "uri": "file:///D:/a/dhall-haskell/dhall-haskell/dhall-lsp-server/tests/fixtures/completion/Bindings.dhall", "diagnostics": [ { "severity": 1, "range": { "start": { "line": 0, "character": 57 }, "end": { "line": 0, "character": 59 } }, "source": "Dhall.TypeCheck", "message": "Error: Unbound variable: al" }, { "severity": 4, "range": { "start": { "line": 0, "character": 12 }, "end": { "line": 0, "character": 19 } }, "source": "Dhall.Lint", "message": "Unused let binding 'alice'" } ] }, "method": "textDocument/publishDiagnostics" } Use -p '/suggests user defined bindings/' to rerun this test only. ```

Here's the source for the failing tests:

https://github.com/dhall-lang/dhall-haskell/blob/48ada4ccf445bacde7ee9e5793261c09388f46e4/dhall-lsp-server/tests/Main.hs#L107-L133

sjakobi commented 2 years ago

It seems that this issue is somehow caused by https://github.com/dhall-lang/dhall-haskell/pull/2449 – at least, when I revert the commit, the tests start passing again.

So far, I don't see anything in that refactoring, that could have caused this issue! :thinking:

Any ideas, @mmhat?

sjakobi commented 2 years ago

Judging by the list of hs-boot-related GHC bugs, I wouldn't be too surprised if this issue is a manifestation of one of them.

sjakobi commented 2 years ago

So I've tried to used perf (with sudo perf record -p <pid> -g) to figure out what the hanging dhall-lsp-server process is doing. The top of the perf report looks like this:

Samples: 23K of event 'cycles', Event count (approx.): 25911866530
  Children      Self  Command          Shared Object      Symbol
-  100,00%     0,00%  dhall-lsp-serve  [unknown]          [k] 0000000000000000                                      ◆
     0                                                                                                              ▒
     0x42000fa940                                                                                                   ▒
     dhallzm1zi41zi2zminplace_DhallziSyntaxziInstancesziMonad_zdwzdczgzgze_info                                     ▒
-  100,00%     0,00%  dhall-lsp-serve  [unknown]          [k] 0x00000042000fa940                                    ▒
     0x42000fa940                                                                                                   ▒
     dhallzm1zi41zi2zminplace_DhallziSyntaxziInstancesziMonad_zdwzdczgzgze_info                                     ▒
-  100,00%    99,93%  dhall-lsp-serve  dhall-lsp-server   [.] dhallzm1zi41zi2zminplace_DhallziSyntaxziInstancesziMon▒
     99,93% 0                                                                                                       ▒
        0x42000fa940                                                                                                ▒
        dhallzm1zi41zi2zminplace_DhallziSyntaxziInstancesziMonad_zdwzdczgzgze_info                                  ▒
     0,05%     0,00%  dhall-lsp-serve  [kernel.kallsyms]  [k] asm_sysvec_apic_timer_interrupt

I'm not quite sure how to interpret this, but maybe it's some kind of loop within Dhall.Syntax.Instances.Monad?! I see that the instance was changed in #2449 – is it possible that it introduced a bug?!

sjakobi commented 2 years ago

Yes, unless I'm missing something the problem is with the use of join here:

https://github.com/dhall-lang/dhall-haskell/blob/e9efecbc1a94ccba56576dbcdd71de1a17caf55d/dhall/src/Dhall/Syntax/Instances/Monad.hs#L10-L13

join itself is defined via >>=, so it results in an infinite loop.

sjakobi commented 2 years ago

Fix is up in #2457.

I'm relieved that it's not a bug in GHC's hs-boot handling and we don't have to revert #2449! :sweat_smile: