enso-org / enso

Enso Analytics is a self-service data prep and analysis platform designed for data teams.
https://ensoanalytics.com
Apache License 2.0
7.38k stars 323 forks source link

Disappearing stack trace frames #9800

Open radeusgd opened 6 months ago

radeusgd commented 6 months ago

Under not-very-clear circumstances (but I've found a reliable repro, I just don't know WHY it's happening), when we get a panic, it is lacking stack frames that should be there. In at least one case this led to a bug that was hard to find because the place where it was being triggered was buried in the missing stack frames.

Repro

This repro is a bit invovled but as for now I could not find a simpler one. Most likely it will only work on Windows, because it relies on the error deleting a file open in another program.

  1. First apply the following patch:

    Index: test/Table_Tests/src/IO/Excel_Spec.enso
    ===================================================================
    diff --git a/test/Table_Tests/src/IO/Excel_Spec.enso b/test/Table_Tests/src/IO/Excel_Spec.enso
    --- a/test/Table_Tests/src/IO/Excel_Spec.enso   (revision c5bf2384e42058e7a60ffaef77f9284487030ddb)
    +++ b/test/Table_Tests/src/IO/Excel_Spec.enso   (date 1714145234362)
    @@ -120,12 +120,11 @@
         clothes = enso_project.data/'clothes.csv' . read
         sub_clothes = clothes.select_columns [0, 1]
         counter = Ref.new 0
    -        prefix = "out_" + (Random.integer 1 10000).to_text + "_"
    +        prefix = "out_"
         Spec_Write_Data.Value [table, clothes, sub_clothes] counter suffix prefix
    
     teardown self =
    -        enso_project.data/"transient" . list (self.prefix+"*") . each f->
    -            f.delete . catch Any error-> IO.println "[CLEANUP] Failed to delete "+f.to_text+": "+error.to_display_text
    +        enso_project.data/"transient" . list (self.prefix+"*") . each f-> f.delete
    
     create_out self suffix=self.suffix =
         i = self.counter.get + 1
  2. Now create and run the following Enso script:

    
    from Standard.Base import all
    from Standard.Table import all

main = f = File.new "test/Table_Tests/data/transient/out_123.xls" IO.println <| (Table.new [["X", [1,2,3]]]).write f

opened_workbook = f.read
IO.println opened_workbook.sheet_names

# Now pause execution by opening the REPL to hold the file locked:
Standard.Base.Runtime.Debug.breakpoint

3. The script should print a path and `[EnsoSheet]` and open a REPL. Keep that REPL open - we need to keep a file locked.
4. Now, run `.\test\Table_Tests\src\IO\Excel_Spec.enso`

The last command should fail with the following stack trace:
```diff
Execution finished with an error: An IO error has occurred: java.nio.file.FileSystemException: X:\NBO\enso\test\Table_Tests\data\transient\out_123.xls: Proces nie może uzyskać dostępu do pliku, ponieważ jest on używany przez inny proces (X:\NBO\enso\test\Table_Tests\data\transient\out_123.xls).
        at <enso> Panic.throw(Internal)
        at <enso> case_branch(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Error.enso:138:25-43)
        at <enso> <anonymous>(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Error.enso:137-142)
        at <enso> Error.catch_primitive(Internal)
        at <enso> Error.catch(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Error.enso:136-142)
+       at <enso> Panic.type.rethrow(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Panic.enso:88:21-47)
        at <enso> go<arg-2>(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Data\Range.enso:259:21-52)
-       at <enso> go(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Data\Range.enso:258-260)
-       at <enso> null(Internal)
        at <enso> <anonymous>(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Test\0.0.0-dev\src\Group.enso:39:37-40)
        at <enso> case_branch(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Test\0.0.0-dev\src\Helpers.enso:36:13-34)
        at <enso> Helpers.run_specs_from_group(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Test\0.0.0-dev\src\Helpers.enso:26-37)
(...) truncated

I have highlighted red two lines between which we are missing some stack frames. We know that the Panic is encountered in our Spec_Write_Data.teardown method, and the panic is caused by rethrowing an error returned by a callback within Vector.each. We can see the Panic.rethrow (highlighted green), but we don't see Spec_Write_Data.teardown nor Vector.each frames.

Now, to prove that these frames should be there, I will modify the code again. Apply the following patch now:

Index: test/Table_Tests/src/IO/Excel_Spec.enso
===================================================================
diff --git a/test/Table_Tests/src/IO/Excel_Spec.enso b/test/Table_Tests/src/IO/Excel_Spec.enso
--- a/test/Table_Tests/src/IO/Excel_Spec.enso   (revision 36be3ec7b0f4bd45b4f20e6584a9d6bf46636351)
+++ b/test/Table_Tests/src/IO/Excel_Spec.enso   (date 1714146286800)
@@ -124,7 +124,8 @@
         Spec_Write_Data.Value [table, clothes, sub_clothes] counter suffix prefix

     teardown self =
-        enso_project.data/"transient" . list (self.prefix+"*") . each f-> f.delete
+        enso_project.data/"transient" . list (self.prefix+"*") . each f->
+            Error.throw (Illegal_Argument.Error "hi there!")

     create_out self suffix=self.suffix =
         i = self.counter.get + 1

We essentially replace the File error with our own thrown dataflow error. Now run the test/Table_Tests/src/IO/Excel_Spec.enso again. This time the stack trace is following:

Execution finished with an error: Illegal Argument: hi there!
        at <enso> Panic.throw(Internal)
        at <enso> case_branch(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Error.enso:138:25-43)
        at <enso> <anonymous>(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Error.enso:137-142)
        at <enso> Error.catch_primitive(Internal)
        at <enso> Error.catch(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Error.enso:136-142)
        at <enso> Panic.type.rethrow(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Panic.enso:88:21-47)
        at <enso> go<arg-2>(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Data\Range.enso:259:21-52)
-       at <enso> go(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Data\Range.enso:258-260)
+       at <enso> Range.each<arg-2>(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Data\Range.enso:261:13-25)
+       at <enso> Range.each(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Data\Range.enso:255-261)
+       at <enso> Array_Like_Helpers.each(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Internal\Array_Like_Helpers.enso:282-283)
+       at <enso> Vector.each(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Base\0.0.0-dev\src\Data\Vector.enso:785:9-38)
+       at <enso> Spec_Write_Data.teardown(Excel_Spec.enso:127-128)
+       at <enso> <anonymous><arg-1>(Excel_Spec.enso:149:13-25)
-       at <enso> null(Internal)
        at <enso> <anonymous>(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Test\0.0.0-dev\src\Group.enso:39:37-40)
        at <enso> case_branch(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Test\0.0.0-dev\src\Helpers.enso:36:13-34)
        at <enso> Helpers.run_specs_from_group(X:\NBO\enso\built-distribution\enso-engine-0.0.0-dev-windows-amd64\enso-0.0.0-dev\lib\Standard\Test\0.0.0-dev\src\Helpers.enso:26-37)
(...) truncated

I have highlighted green frames that are now appearing in the stack trace (and red the same frames as in the previous trace). In both scenarios the panic is caused by Panic.rethrow rethrowing a dataflow error returned from a callback in Vector.each. So knowing how that works, we know that this part of the stack trace should be the same in both scenarios. Yet, in this scenario we see these frames, but in the one above - we do not.

What makes this worst is that the <enso> Spec_Write_Data.teardown(Excel_Spec.enso:127-128) frame got dropped making the bug above pretty hard to debug.

radeusgd commented 6 months ago

The null(Internal) frame also looks a little bit suspicious.

Akirathan commented 6 months ago

Seems like this may be related to tail call optimizations throwing away some stack frames, tracked in https://github.com/enso-org/enso/issues/9251.

JaroslavTulach commented 6 months ago

Another related issue might be #8153