Open Llewyllen opened 3 months ago
I don't understand the issue. The following C program prints abcxyz
, according to the standard:
#include <stdio.h>
int main() {
char* str = "abc""xyz";
printf("%s\n", str);
return 0;
}
Can you clarify what pycparser is doing wrong, in your opinion?
For octal "\07""7" is a 3 bytes string composed of 0x07 (octal value 7), 0x37 (character '7') and 0x00 (string end) "\077" is a 2 bytes strings composed of 0x3F (octal value 77) and 0x00
For hexadecimal "\x7""7" is a 3 bytes string composed of 0x07, 0x37 and 0x00 "\x77" is a 2 bytes string composed of 0x77 and 0x00
So if you simply remove consecutive double quotes (what PyCParser does), you get the wrong value
char test1()
{
char* tmp = "\07""7";
return tmp[0];
}
char test2()
{
char* tmp = "\077";
return tmp[0];
}
These 2 functions do not return the same value. First one returns 0x07, second one returns 0x3F
Ah, so it's specific to octal and hex, then... PR to fix welcome, though it has to handle all cases of string literal concatenation properly
As I said, there are not that many solutions
so I won't do a PR, as there is no ideal solution
Well, I did create a PR, not sure it will pass the tests (but it works for my needs)
From what I saw, it will not pass the test_unified_string_literals
test, but then, this test is rather wrong because string concatenation is not as simple as removing consecutive double quotes.
I could add the test
d7 = self.get_decl_init(r'char* s = "\07" "7";')
self.assertNotEqual(d7, ['Constant', 'string', r'"\077"'])
and the current version would fail
I just saw that p_unified_wstring_literal
has the same problem, but I won't put my hand in the widechar trap
The following valid c99 code
is wrongly parsed and returns a c_ast.Constant object with value
'\077'
which is incorrect. Same goes with hexadecimal.The easy solution is to modify CParser.p_unified_string_literal by replacing
p[1].value = p[1].value[:-1] + p[2][1:]
byp[1].value = p[1].value + p[2]
as simply removing double quotes it not a good idea. The modification would return a value of
'\07""7'
which is better but needs to be parsed to get each characters.Another solution would be to have a list of strings for the value, but that would have way more impacts on other parts of the code (like the generator)