MatrixAI / Polykey-CLI

Polykey CLI - Open Source Decentralized Secret Sharing System for Zero Trust Workflows
https://polykey.com
GNU General Public License v3.0
6 stars 3 forks source link

Unwrap `ErrorPolykeyRemote` when reporting errors #107

Open tegefaulkes opened 9 months ago

tegefaulkes commented 9 months ago

Specification

When an RPC call fails it throws ErrorPolykeyRemote with the true error as the cause. This is very noisy when printed by the CLI command and is frankly confusing at times. When reporting these errors we want to unwrap the ErrorPolykeyRemote and just report the actual cause of the error.

Additional context

Tasks

  1. Unwrap the ErrorPolykeyRemote error and just report the cause when reporting the RPC errors.
CMCDragonkai commented 2 months ago

Only the commands that are performing remote calls should unwrap it. This is because they know the reason why, the reason is due to the remote side failing. However there's a caveat.

This is what it looks like when there's a remote side error:

ErrorPolykeyRemote: Remote error from RPC call
  localHost    ::1
  localPort    45402
  remoteHost    ::1
  remotePort    33107
  command    vaultsSecretsGet
  timestamp    Thu Aug 22 2024 20:37:34 GMT+1000 (Australian Eastern Standard Time)
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    cause: {"type":"ErrorEncryptedFSError","data":{"message":"ENOENT: no such file or directory, customer_nubmer","timestamp":"2024-08-22T10:37:34.634Z","data":{},"stack":"ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer\n    at f._open (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2158:117625)\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2158:115345\n    at async Object.maybeCallback (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2155:21076)\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2158:120774\n    at async Object.maybeCallback (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2155:21076)\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2283:135663\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2246:8712\n    at async withF (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:7:9819)\n    at async Object.getSecret (/home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2283:135640)\n    at async /home/cmcdragonkai/Projects/Polykey-CLI/dist/polykey.js:2283:142996","_errno":34,"_code":"ENOENT","_description":"no such file or directory","_syscall":"open"}}

You can see here that it's a bit verbose firstly with alot of metadata. This is being discussed in ENG-119 or #17. Additionally the error formatting isn't properly being done recursively. There's another cause after the first cause, and that's just JSON now.

Now the main problem here is how do we properly communicate that the error is coming from the remote side, and it's not an error of the local side. This is where the command that's receiving this error needs to catch and interrogate it. I can imagine that if we can reduce verbosity, it would look like:

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    cause: ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer

Yet I still don't like this. There's an old issue I cannot find anymore that talked flipping this chain order. Because the error that I do care about is actually the last error.

If the errors are really long, it obviously favours the last message, but if they are short, it's clearer to write down the actual error, and the wrapper.

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    cause: ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer

But then there's also the fact that one has to be discerning about the level of errors that's relevant.

Only the command understands the context. Which is to say, it knows that ErrorSecretsSecretUndefined is what really matters. In which case, it should drop irrelevant information assuming low verbosity error reporting and say:

ErrorEncryptedFSError: ENOENT: no such file or directory, customer_nubmer
  caused: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist
    caused: ErrorPolykeyRemote: Remote error from RPC call 

This would be an interesting way of reporting it because it's not exactly how the structure of the error objects themselves are setup. You have to invert the chain to report it. It could be confusing if later we increase error verbosity and we end up reversing it again back to the normal object chain in order to show all the error object properties properly. It also wouldn't match the JSON version of errors too.

Finally the command also has enough context to to specifically understand that ErrorSecretsSecretUndefined is sufficient to explain the problem. And thus drop the ErrorEncryptedFSError entirely when reporting. However that error object still exists. So this means the command itself when returning the error object, should provide a "index" tag back to the reporter for it to know what error message to focus on. So I would like to see something like:

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret does not exist - Secret with name: customer_nubmer does not exist

Additionally we may drop descriptions in favour of messages if the message is sufficient. We may keep the description in case the message doesn't exist. This makes the description a sort of "default message". Thus giving us:

ErrorPolykeyRemote: Remote error from RPC call
  cause: ErrorSecretsSecretUndefined: Secret with name: customer_nubmer does not exist

Now why do we keep the Error… class name? That's because the class name itself is in fact the error code that can be referenced as they are unique global names in the Polykey ecosystem. Kind of like how in Windows they do a thing like error code 14598374, and you can give that to the devs to debug. However since this is progammatic information, it should not necessarily be featured first.

Remote error from RPC call
  name: ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist
    name:  ErrorSecretsSecretUndefined

I still don't quite like this. I think instead the "code" in this case being the unique error class name, should actually be on the same line. All of our errors should be easily explained in one line (or can they…?)

Remote error from RPC call : ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist : ErrorSecretsSecretUndefined

The : here is space separated unlike before. And we could also use | to be clearer.

While the inverted chain may be better in just understanding, the fact that it ends up being flipped when it is actually verbose I think is a bad trade off.

CMCDragonkai commented 2 months ago
Remote error from RPC call | ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist | ErrorSecretsSecretUndefined

Remote error from RPC call - ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist - ErrorSecretsSecretUndefined

I do prefer the dash.

CMCDragonkai commented 2 months ago

So that would mean we don't technically unwrap it, because we do still need to indicate whether it is a local error or a remote error.

Inversion is too confusing when the high verbosity output will be in the regular format.

So the only thing really here is verbosity itself, and changing it so that the error class name is at the very end of the message. This assumes all error messages can be explained in one short line, and that messages override descriptions when reported, so that descriptions is acting as the default error message.

What if the error message is far longer, and has to take up multiple lines?

Some error and error and error, sdfds.
And this is because of this reason and that reason.

And this paragraph exists too.
- ErrorPolykeySomethingMultiLineErrorMessage
  cause: Some other reason blah blah blah
         indented multiline message of course!
         - ErrorPolykeySomething
aryanjassal commented 2 months ago

I personally really like the python way of handling errors, and I believe it would be easier to read for non-technical users. Something like this might work:

Traceback (most recent call last):
  File "/home/aryanj/Downloads/test.py", line 14, in <module>
    fn4()
  File "/home/aryanj/Downloads/test.py", line 11, in fn4
    fn3()
  File "/home/aryanj/Downloads/test.py", line 8, in fn3
    fn2()
  File "/home/aryanj/Downloads/test.py", line 5, in fn2
    fn1()
  File "/home/aryanj/Downloads/test.py", line 2, in fn1
    raise AttributeError
AttributeError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/aryanj/Downloads/test.py", line 16, in <module>
    raise ValueError
ValueError

I just wrote this test script which raises two errors. Here, not much nesting or indentation is seen, but all the information is provided to the reader. Of course, we might not have the details like the file name or line number, but that can be replaced with the error that was called, etc.

For example, this is how I currently do error displaying in https://github.com/MatrixAI/Polykey-CLI/pull/255

[aryanj@matrix-34xx]$ pk secrets ls newvault:/doesntexist

ErrorPolykeyCLIFileRead: Failed to read from filesystem: Failed to read directory: /doesntexist

I did this by catching the thrown remote error, and throwing the unwrapped error with the same details without any additional metadata. I will remove this before merging the PR, but this made it much easier to read and understand what is going on and what went wrong during development.

CMCDragonkai commented 2 months ago

I'm very aware of how python does their errors. Make sure you read my post to the end! Our cause chain is a more streamlined version of python traces.

CMCDragonkai commented 2 months ago

My favourite is this atm.

Remote error from RPC call - ErrorPolykeyRemote
  cause: Secret with name: customer_nubmer does not exist - ErrorSecretsSecretUndefined

But read what I wrote earlier as it explains that iteration.

CMCDragonkai commented 2 months ago

Technically pythons report is actually the inverted version of ours.

In a way it's doing a stack trace which is inverted.

The object error chain goes outside in, while the stack trace goes inside out.