UsernameFodder / pmdsky-debug

Debug info for reverse engineering PMD: Explorers of Sky
GNU General Public License v3.0
38 stars 20 forks source link

Add flag to mark aliases as deprecated to header augmentation script #259

Closed tech-ticks closed 4 months ago

tech-ticks commented 4 months ago

Adds a new flag --mark_aliases_deprecated to augment_headers.py that generates warnings such as:

c-of-time/src/main.c:17:3: warning: 'ProcessScriptParam' is deprecated: Renamed to 'ScriptParamToInt' [-Wdeprecated-declarations]
   17 |   int x = ProcessScriptParam(123);
      |   ^~~

I'm opening this PR as a draft because the formatter generates very weird looking code:

__attribute((deprecated("Renamed to 'ScriptParamToInt'"))) int16_t
// THIS DOCSTRING WAS GENERATED AUTOMATICALLY
/**
 * Converts the given opcode parameter to a signed integer.
 * 
 * The parameter will be returned unchanged unless one of its two most significant bits (0x8000 and 0x4000) are set, in which case both bits will be cleared and the original value will be modified according to the following two rules:
 * - If the 0x4000 bit is set (sign bit), the value will be set to -16384 + value.
 * - If the 0x8000 bit is set (fixed-point flag), the value will be set to value / 256, rounded down.
 * Both rules can be applied, in the same order as listed, if both conditions are met.
 * 
 * r0: Parameter to convert
 * return: The input parameter, as a signed integer
 */
ProcessScriptParam(uint16_t parameter);

This issue doesn't seem to occur if I set the formatter to None, but I don't have enough insight into how it works to fix it.

UsernameFodder commented 4 months ago

I'm opening this PR as a draft because the formatter generates very weird looking code:

Hmm, I see the problem. There's actually multiple interacting problems 😅.

The augmenter works in multiple passes. The first pass goes in and adds aliases by essentially copy-pasting aliased functions. The second pass prepends docstrings.

Both passes process files line-by-line, searching for symbol names with a regex. Whenever content is added, the pass will just inject extra lines into the file stream. After each pass, the formatter is run.

The function regex is:

    # Look for words preceding an open single parenthesis. Not perfect, but
    # good enough. This should work for everything except functions
    # with function pointer parameters, but function pointer parameters
    # should really be typedef'd for readability anyway.
    NAME_REGEX = re.compile(r"\b(\w+)\s*\([^(]")

So here's the problem(s): First, aliases are added. With your changes, an alias declaration might look something like this:

__attribute__((deprecated("Renamed to 'DataTransferInit'"))) void FileRom_InitDataTransfer(void);

Or if the line is too long, the formatter will break the line:

__attribute__((deprecated("Renamed to 'FileInitVeneer'"))) void
FileRom_Veneer_FileInit(struct file_stream* file);

Then docstrings are added. In the first case, the presence of the __attribute__( and deprecated( fool the regex, so it doesn't find the right function name, and no docstring is written at all. In the second case, the function name is found, but the docstring is inserted on the line before it, which ends up in between the attribute specifier and the actual function name. So it's broken in both cases.

So I think a correct implementation might take a tad more work. Maybe adding a third pass at the end? It seems sort of tricky to get right; I'd have to think about it a bit more.

UsernameFodder commented 4 months ago

So right now the script is (mostly) idempotent (you can run it multiple times repeatedly without breaking the headers horribly), which is a property I'd like to keep in case somebody tries to run the script again by accident. That makes things a bit hard, but I think I thought of a reasonable (somewhat hacky) approach that might work:

I think this approach should ensure that attribute lines are written in the write place, and that they don't get duplicated if you run the script multiple times.

tech-ticks commented 4 months ago

Sorry for the late reply. I tried to implement your workaround, but with the formatter enabled I now end up with:

int16_t
    // THIS DOCSTRING WAS GENERATED AUTOMATICALLY
    /**
     * Converts the given opcode parameter to a signed integer.
     *
     * The parameter will be returned unchanged unless one of its two most significant bits (0x8000
     * and 0x4000) are set, in which case both bits will be cleared and the original value will be
     * modified according to the following two rules:
     * - If the 0x4000 bit is set (sign bit), the value will be set to -16384 + value.
     * - If the 0x8000 bit is set (fixed-point flag), the value will be set to value / 256, rounded
     * down. Both rules can be applied, in the same order as listed, if both conditions are met.
     *
     * r0: Parameter to convert
     * return: The input parameter, as a signed integer
     */
    __attribute__((deprecated("Renamed to 'ScriptParamToInt'"))) // alias
    ProcessScriptParam(uint16_t parameter);

It only seems to separate the type like this if the formatter is enabled. Not sure why this happens in the first place, I'd have to wrap my head around how exactly the formatter works.

UsernameFodder commented 4 months ago

Sorry for the late reply. I tried to implement your workaround, but with the formatter enabled I now end up with:

int16_t
    // THIS DOCSTRING WAS GENERATED AUTOMATICALLY
    /**
     * Converts the given opcode parameter to a signed integer.
     *
     * The parameter will be returned unchanged unless one of its two most significant bits (0x8000
     * and 0x4000) are set, in which case both bits will be cleared and the original value will be
     * modified according to the following two rules:
     * - If the 0x4000 bit is set (sign bit), the value will be set to -16384 + value.
     * - If the 0x8000 bit is set (fixed-point flag), the value will be set to value / 256, rounded
     * down. Both rules can be applied, in the same order as listed, if both conditions are met.
     *
     * r0: Parameter to convert
     * return: The input parameter, as a signed integer
     */
    __attribute__((deprecated("Renamed to 'ScriptParamToInt'"))) // alias
    ProcessScriptParam(uint16_t parameter);

It only seems to separate the type like this if the formatter is enabled. Not sure why this happens in the first place, I'd have to wrap my head around how exactly the formatter works.

Ugh, it appears that this is a bug in clang-format (or GCC attributes are just unsupported). I found this SO post where someone was trying to do exactly what we are, and there's also some open GitHub issues for clang-format about it. One of the comments mentions a workaround where you can define a macro. I tested it and this seems to work with the formatter on my machine:

#ifndef DEPRECATED
#define DEPRECATED(name) __attribute__((deprecated("Renamed to '" name "'")))
#endif

DEPRECATED("ScriptParamToInt")
int16_t ProcessScriptParam(uint16_t parameter);

It's a little annoying, but I guess we can have the script insert the macro in every header file right after the include guard (I suggest matching with the regex #define HEADERS_[A-Z0-9_]+_H_? to be somewhat robust). And play the same game with it for idempotence: inject it, and then skip over any copies from previous runs (anything between #ifndef DEPRECATED and the next #endif).

tech-ticks commented 4 months ago

Alright thanks, that seems to fix the issue.