UKGovernmentBEIS / inspect_ai

Inspect: A framework for large language model evaluations
https://UKGovernmentBEIS.github.io/inspect_ai/
MIT License
385 stars 41 forks source link

Better error messages #43

Closed lukasberglund closed 3 days ago

lukasberglund commented 3 weeks ago

Thanks for building this tool! It's been great to use - I just have one small gripe.

When there's an error inspect, I wish it were possible to see the full stack trace. In particular, it would be nice to (a) see the line of each file that called a line in another file and (b) be able to see/click on the file name for each file in the stack trace. Right now neither of this is possible and I don't know how to change this in inspect. Let me know if I'm missing something.

image
aisi-inspect commented 3 weeks ago

A couple of things here:

Are you primarily concerned with the stack frames of your own code or do you actually want to see the surrounding inspect frames?

Finally, you can usually click on the stack frames within VS Code to navigate to them. I think the problem here is that the names are so long that the path is truncated (and VS Code sadly then can't resolve the full path). Again for user code this typically wouldn't be a problem. I'll see if there is something we can do to change this (I noted there is a word_wrap option, perhaps that would do it).

lukasberglund commented 3 weeks ago

Thanks for responding!

Those look like Inspect internal stack traces, we are actually attempting to hide those (as they obscure user stack frames). I'm curious how you are getting those in your display, is this a version of Inspect embedded inside another package? Or alternatively perhaps our mechanism isn't working as expected (I'll check into that).

I don't think I have inspect embedded in another package. I do use poetry, I'm not sure if that counts. FWIW, these are the only stack traces I see when inspect runs into an error. I don't see any user errors.

We currently limit the stack frames shown to 10

Makes sense! Personally, I'd prefer to see all stack frames and the line that caused the error, and be able to click on the file for each stack frame, but of course others may have different preferences.

tekumara commented 2 weeks ago

Thanks for looking into this! I'm also having troubling clicking on the log lines because they are truncated.

I too would prefer to see the standard traceback with all frames. In my case I'm using a custom model and I'd like to see the frames from that, but they are hidden. All I see is third-party frames

eg:

╭─ beta_launch (36 samples) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ╭─────────────────────────── Traceback (most recent call last) ───────────────────────────╮                                   evals/XX │
│ │ /Users/tekumara/code/my-evals/.venv/lib/python3.11/site-packages/insp… │   dataset: blue │
│ │ in task_run                                                                             │                       scorer: intent_match │
│ │                                                                                         │                                            │
│ │ /Users/tekumara/code/my-evals/.venv/lib/python3.11/site-packages/insp… │                                            │
│ │ in task_run_sample                                                                      │                                            │
│ │                                                                                         │                                            │
│ │ /Users/tekumara/code/my-evals/.venv/lib/python3.11/site-packages/insp… │                                            │
│ │ in solve                                                                                │                                            │
│ │                                                                                         │                                            │
│ │ /Users/tekumara/code/my-evals/.venv/lib/python3.11/site-packages/insp… │                                            │
│ │ in generate                                                                             │                                            │
│ │                                                                                         │                                            │
│ │ /Users/tekumara/code/my-evals/.venv/lib/python3.11/site-packages/insp… │                                            │
│ │ in task_generate                                                                        │                                            │
│ │                                                                                         │                                            │
│ │                                ... 30 frames hidden ...                                 │                                            │
│ │                                                                                         │                                            │
│ │ /Users/tekumara/code/my-evals/.venv/lib/python3.11/site-packages/boto… │                                            │
│ │ in sign                                                                                 │                                            │
│ │                                                                                         │                                            │
│ │   187 │   │   │   │   │   signing_context['cache_key'],                                 │                                            │
│ │   188 │   │   │   │   )                                                                 │                                            │
│ │   189 │   │   │   try:                                                                  │                                            │
│ │ ❱ 190 │   │   │   │   auth = self.get_auth_instance(**kwargs)                           │                                            │
│ │   191 │   │   │   except UnknownSignatureVersionError as e:                             │                                            │
│ │   192 │   │   │   │   if signing_type != 'standard':                                    │                                            │
│ │   193 │   │   │   │   │   raise UnsupportedSignatureVersionError(                       │                                            │
│ │                                                                                         │                                            │
│ │ ╭─────────────────────────────────── locals ────────────────────────────────────╮       │                                            │
│ │ │           expires_in = None                                                   │       │                                            │
│ │ │ explicit_region_name = None                                                   │       │                                            │
│ │ │               kwargs = {                                                      │       │                                            │
│ │ │                        │   'signing_name': 'bedrock',                         │       │                                            │
│ │ │                        │   'region_name': 'us-east-1',                        │       │                                            │
│ │ │                        │   'signature_version': 'v4'                          │       │                                            │
│ │ │                        }                                                      │       │                                            │
│ │ │       operation_name = 'InvokeModel'                                          │       │                                            │
│ │ │          region_name = 'us-east-1'                                            │       │                                            │
│ │ │              request = <botocore.awsrequest.AWSRequest object at 0x31933dcd0> │       │                                            │
│ │ │                 self = <botocore.signers.RequestSigner object at 0x31aa193d0> │       │                                            │
│ │ │    signature_version = 'v4'                                                   │       │                                            │
│ │ │      signing_context = {}                                                     │       │                                            │
│ │ │         signing_name = 'bedrock'                                              │       │                                            │
│ │ │         signing_type = 'standard'                                             │       │                                            │
│ │ ╰───────────────────────────────────────────────────────────────────────────────╯       │                                            │
│ │                                                                                         │                                            │
│ │ /Users/tekumara/code/my-evals/.venv/lib/python3.11/site-packages/boto… │                                            │
tekumara commented 2 weeks ago

The other thing I noticed with the rich traceback is it can reveal secrets.

CleanShot 2024-06-18 at 13 32 35@2x

Or in some other cases the list of locals displayed is so large that the scrollback buffer is completely filled and I can't see any other frame.

jjallaire commented 3 days ago

Two changes that should address this:

1) We no longer show locals in the stack trace

2) We no longer place a limit on the number of stack frames displayed.