dgibson / dtc

Device Tree Compiler
215 stars 127 forks source link

Revert "dtc: Consider one-character strings as strings" #121

Open flto opened 9 months ago

flto commented 9 months ago

This reverts commit 9d7888cbf19c2930992844e69a097dc71e5a7354.

This commit broke decompilation in certain cases. For example: regulator-min-microvolt = <0x00324b00>; is now decompiled (incorrectly) as: regulator-min-microvolt = "\02K";

dgibson commented 9 months ago

Decompilation of dtbs, like most decompilations, is an approximate process. Although we can guarantee we get something that will recompile into the same dtb, we can't guarantee we'll have the same dts as the original: the type information that's implied in the dts is simply lost along the way. In the dts format "\02K" and <0x00324b00> are different ways of expressing exactly the same bytes (so are [00324b00], /bits/ 16 <0x0032 0x4b00> or "", "2K", or even /bits/ 16 <0x0032>, "K").

The function affected here, guess_value_type() is exactly that - a guess at a likely original type. There's no way for it to be correct 100% of the time.

So in order to make a change here, it's not sufficient to say it's "wrong" for some examples, you need to make the case that the cases it now gets "right" (i.e. more readable / less surprising) are more common or more important than the cases it now gets "wrong" (less readable / more surprising).

flto commented 9 months ago

the problem is it doesn't recompile to the same bytes - compiling the "\02K" becomes [02 4b 00], not <0x00324b00>

dgibson commented 9 months ago

the problem is it doesn't recompile to the same bytes - compiling the "\02K" becomes [02 4b 00], not <0x00324b00>

Oh, right, good point.

But, reverting this patch is only papering over the problem. I'm pretty sure there exist other examples that will trigger the same problem even with the revert in place. The real bug here is in write_propval_string. It's meaning for its output to be interpreted as '\0' then '2' then 'K' then '\0', which would come out to the correct bytes. However, because there happens to be a digit after the \0 it's instead interpreted as '\02'.

I guess we need to have write_propval_string look ahead to the next byte, and if it's a valid octal digit, we need to emit \x00 (or \000) instead of just \0. The lookahead will be kind of a pain, but it needs to be done. I supposed we could just emit \x00 all the time, but I think that ugliness in the output is worse than the internal ugliness of the lookahead.

flto commented 9 months ago

if you don't want the output to be ugly, you should split the string for \0 characters (this will improve readability and should be easier to implement than look ahead - but I did not look at the code), and for any other characters that would need an escape sequence, always use 2 digit hex (more readable than octal)

dgibson commented 9 months ago

if you don't want the output to be ugly, you should split the string for \0 characters (this will improve readability and should be easier to implement than look ahead - but I did not look at the code),

We still need to fix write_propval_string for the case where we have type information telling us to output a particular section as a string. Though, yes, usually breaking strings on \0 is a good idea.

and for any other characters that would need an escape sequence, always use 2 digit hex (more readable than octal)

Yes, we already use hex escapes for most characters.

dgibson commented 9 months ago

Oh dear. I had a closer look into this and it's a real can of worms.