GDB output unescaping - Githubissues

wkordalski commented 3 months ago

Hi!

Could we do a little more correct unescaping of GDB output?

Current implementation is probably the most simplest one and does not handle some common cases like escaped quotation marks and backslashes (see: For help, type \"help\".). New lines ('\n') are ignored (see: License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. in a single line).

Of course — these are only messages from GDB that nearly nobody reads. But when you p some_string, the answer can be \"\" (empty string literal, but escaped), \"\\\"string literal\\ncontaining literal\\\"\", or even ^[a-z]+\\\\\\\\[a-z]+\\\\s*=\\\\s*\\\"[^\\\"]*\\\"$ depending on code you debug :)

I didn't analyzed much what the escaping mechanism is, but it seems that the messages are escaped using rules known from C language. And I think that we can greatly improve unescaping of GDB output with very low effort.

What do you think? I can write a PR. Or I can leave it to you.

epasveer commented 3 months ago

You're right, it's one of the simplest log widgets. If you're up to it, go ahead and create a Pull Request.

epasveer commented 2 months ago

I created a branch for this task and fixed some things. Are you able to sync this branch and test it?

https://github.com/epasveer/seer/tree/238-gdb-output-unescaping

If not, I can merge it into "main".

wkordalski commented 2 months ago

The branch does not unescape at least \\ sequence.

I would rather go for the solution below of replacing \n with new-line, etc.:

Qstring unescape(Qstring input) {
  Qstring result;
  // maybe reserve `input.length()` bytes in result, so that it will not allocate during `push_back()`

  for(auto it = input.cbegin(); it != input.cend(); it++) {
    if(*it == Qchar('\\')) {
      it++;
      if (it == input.cend()) {
        result.push_back(Qchar('\\'));
        return result;
      }

      switch(*it) {
        case Qchar('n'): {
            result.push_back(Qchar('\n'));
            break;
        }
        case Qchar('\''): {
            result.push_back(Qchar('\''));
            break;
        }
        case Qchar('\"'): {
            result.push_back(Qchar('\"'));
            break;
        }
        case Qchar('\\'): {
            result.push_back(Qchar('\\'));
            break;
        }
        // and so on for \a, \b, \e, \f, \r, \t, \v
        // maybe handle octal, hexadecimal ASCII and Unicode forms, but probably in the more distant future
        // ...

        default: {
            // unknown escape sequence — do not escape it — I think it is reasonable default
            result.push_back(Qchar('\\'));
            result.push_back(*it);
            break;
        }
      }
    } else {
      result.push_back(*it);
    }
  }

  return result;
}

I haven't been writing in C++ for few years, never have been using Qt and haven't tested above code. Thus the code might not compile and I might not have taken some magic features of C++ (implicit conversions?) into account. Nevertheless I hope it express the general idea of the solution.

Sorry, I haven't created fix for this myself yet. Tasks with higher priority arised. I will fix this in the future if you don't, but I cannot promise when.

epasveer / seer

GDB output unescaping #238