eclipse-cdt-cloud / cdt-gdb-adapter

CDT GDB Debug Adapter
Eclipse Public License 2.0
28 stars 40 forks source link

Add support for escaped unicode characters #276

Closed jonahgraham closed 1 year ago

jonahgraham commented 1 year ago

GDB is escaping unicode characters leading to the messages coming back from GDB not being understood correctly.

For example, hitting a breakpoint on issue-275-测试.c comes back from GDB as file="issue-275-\346\265\213\350\257\225.c" but by the time we send it to DAP client we are sending "name":"issue-275-346265213350257225.c"

This patchset supports properly escaping these strings.

Fixes #275

TODO:

jonahgraham commented 1 year ago

I have added a fix which is to decode all cstrings as UTF-8 and handling of the escaped octal numbers. It doesn't work on Windows yet.

I am not sure what else may be needed here, nor if there is any negative impacts of this change. Interestingly I see other encoding errors reported for cpp tools and cortex debug as unfixable in the adapters. See https://github.com/microsoft/vscode-cpptools/issues/1998, https://github.com/Marus/cortex-debug/issues/207 and https://github.com/Marus/cortex-debug/issues/297. Therefore while I think we fix the specific issue reported in #275 I am convinced other issues may still prevent proper support.

tromey commented 1 year ago

MI does a weird thing where all strings are just ASCII, so any non-ASCII characters are emitted byte-wise using C-like escapes. The MI client has to know which encoding to use. This is awful of course, but fixing it would require bumping the MI version and there hasn't been any real pressure to deal with it. Some other escape sequences are used here (e.g., "\n"). I am not sure if this is documented.

jonahgraham commented 1 year ago

Thank for having a look. We're handling n, t, r and now octal 0-7. I'll add more as I have specific cases.

tromey commented 1 year ago

Thank for having a look. We're handling n, t, r and now octal 0-7. I'll add more as I have specific cases.

According to the source I think \b, \f, \e, and \a can be emitted as well.