UsernameFodder / pmdsky-debug

Debug info for reverse engineering PMD: Explorers of Sky
GNU General Public License v3.0
38 stars 20 forks source link

Used original Pascal_SnakeCase for sound functions #241

Closed AnonymousRandomPerson closed 8 months ago

AnonymousRandomPerson commented 8 months ago

Rhokin originally named the DSE sound library functions with Pascal_SnakeCase, but I had to change them because the format wasn't supported at the time. Now that the format is supported, renaming the functions to match their original intention.

Frostbyte0x70 commented 8 months ago

As I said in the other PR, adding prefixes to function names (whether they are split by an underscore or not) causes problems with No$GBA. I'd rather have short names (function name only, no repeated prefixes unless required to distinguish them) as the main name and use the future alias system to include the prefixed names as well. Unless you plan to keep these as temporary names and then move them to aliases, but that implies renaming symbols, so it might be a better idea to use short names for now and then add these to the alias list once it's implemented.

AnonymousRandomPerson commented 8 months ago

Most of the No$ issues boil down to the function name being long, which implies imposing a character limit on functions or enforcing the "action-target" naming pattern instead of "target-action". My understanding is that pmdsky-debug allows a variety of naming styles to make it easier to contribute, and these restrictions would run counter to that idea. Seeing that these names were already there sans the underscore, I don't see an issue with keeping them for now. The upcoming alias feature would allow anyone unsatisfied with a given name to submit an alias for it.

UsernameFodder commented 8 months ago

As I said in the other PR, adding prefixes to function names (whether they are split by an underscore or not) causes problems with No$GBA. I'd rather have short names (function name only, no repeated prefixes unless required to distinguish them) as the main name and use the future alias system to include the prefixed names as well. Unless you plan to keep these as temporary names and then move them to aliases, but that implies renaming symbols, so it might be a better idea to use short names for now and then add these to the alias list once it's implemented.

As I mentioned elsewhere, we're at the intersection between two communities with slightly conflicting interests, but also with the potential to coexist and interoperate. I think that there are important tradeoffs to consider here, and for future precedent moving forward. In general I see two extremes:

  1. Use a shorter, PascalCase name that the PMD community is used to and works better with No$GBA, even if that means changing the name relative to what the pret community calls it.
  2. Use a longer, Pascal_SnakeCase name that's more consistent what the pret community calls it, even though it's worse for No$GBA and not what the PMD community is used to.

When we're dealing with functions that don't have already established names, I think it'd be better not to use the alias system to force option 1 for everything. These functions are very different from something like FileRead, which the SkyTemple community is already quite used to. I think there's a big advantage to standardization to an extent that's practical, which is that people working in both communities can more easily communicate with each other and find their way around both projects simultaneously. It's just easier for everyone if there's one agreed upon name for something, rather than two different ones. I think this advantage outweighs the slight convenience of having shorter names for No$GBA (note that longer names cause no issues with any other tooling like Ghidra, SkyTemple, c-of-time, etc.). IMO, the potential for faster progress from better collaboration is going to pay off more in the long-run.

In particular, I think with anything in arm9/libs.yml that doesn't already have some other name in pmdsky-debug, using the Pascal_SnakeCase names as the primary ones makes more sense if they come from decomp-land. Most of the Nintendo DS decomps (more than just Sky) have standard names for Nitro SDK functions and other such standard libraries. As I mentioned elsewhere, I expect potentially hundreds of new symbols in this file in the near future, all with Pascal_SnakeCase names. I think we would be better served following the standards instead of being different for a marginal benefit.

In this case especially, the tradeoff is adding a single underscore. That seems like a small price to pay for better standardization.

Frostbyte0x70 commented 8 months ago

I guess it makes a bit more sense for the lib functions. Although I still don't see the advantage of having the longer names as the main rame rather than an alias. Both options enable a link between both projects, but only one causes disadvantages with No$GBA.

In this case especially, the tradeoff is adding a single underscore. That seems like a small price to pay for better standardization.

I probably should have mentioned that I don't like the previous names either more explicitly, since they also contain the prefixes. Whether they have an underscore or not doesn't really matter.

UsernameFodder commented 8 months ago

I guess it makes a bit more sense for the lib functions. Although I still don't see the advantage of having the longer names as the main rame rather than an alias. Both options enable a link between both projects, but only one causes disadvantages with No$GBA.

Maybe I wasn't being clear. The alternatives I was considering are:

  1. Have a main name and an alias (whether the short name is the main one or the alias isn't relevant to my points).
  2. Have only a main name, with no aliases.

The advantage of option 2 is what I was trying to convey in my previous comment. Not having aliases at all is simpler, and leaves less room for confusion.

Let me try to spell out some issues I foresee with option 1 more concretely:

With established functions like FileRead, I'm willing to accept the above issues, because people already know these functions by different names anyway. But AFAIK, the only advantage of the short names on new functions is making No$ labels a bit nicer. If names are cut off in No$, there are at least obvious workarounds, like referring back to the symbol tables via the address, or manually changing symbol names in the .sym file to something shorter for personal use. This seems like a better outcome to me than promoting potential alias confusion with every single library function.

Frostbyte0x70 commented 8 months ago

I see. That does make it a bit more clear. Here's another option I just thought of: Using long names with prefixes for functions coming from the decomp, and optionally adding a short alias to them. Then add an option for the tool that generates the .sym files that, if enabled, will create it using the first name of each function that does not contain an underscore. So, for each function:

  1. If the main name doesn't have an underscore, use the main name
  2. If it does but the function has no aliases, use the main name
  3. If it does, go down the alias list and pick the first name that doesn't contain an underscore

This would allow defining No$GBA-friendly names for functions coming from the decomp without causing the issues you listed. Would that be feasible to implement?

UsernameFodder commented 8 months ago

I see. That does make it a bit more clear. Here's another option I just thought of: Using long names with prefixes for functions coming from the decomp, and optionally adding a short alias to them. Then add an option for the tool that generates the .sym files that, if enabled, will create it using the first name of each function that does not contain an underscore. So, for each function:

1. If the main name doesn't have an underscore, use the main name

2. If it does but the function has no aliases, use the main name

3. If it does, go down the alias list and pick the first name that doesn't contain an underscore

This would allow defining No$GBA-friendly names for functions coming from the decomp without causing the issues you listed. Would that be feasible to implement?

This is something I was also thinking about, but doesn't that mean we would still need to come up with manual short names for every decomp function? Which is still a big pain. It also still adds a bunch of aliases that are only really useful for No$ (not as bad since they aren't the primary name, but still).

I was considering some sort of dynamic rule-based renaming scheme during generation, e.g., via some regex specifier. So you could express via a regex to remove the prefix on any Pascal_SnakeCase symbol (e.g., [A-Z]\w+_([^_]*[a-z][^_]*) -> $1). I think that wouldn't be too hard to implement. The big problem with this is the same issue with trying to do dynamic mapping in the sync script, which is that removing the prefix isn't always a good way to do it. E.g., we don't want GX_Init and G3X_Init to both be replaced with Init. Maybe that idea could still work if we did something more clever, like truncating the prefix instead of removing it, or turning the prefix into a suffix instead.

The other question I have is whether it's really worth it to jump through all these hoops. Have you been having problems with any of the function names, besides the ones that were reverted in https://github.com/UsernameFodder/pmdsky-debug/pull/240? Are there any super long names that you've organically encountered while debugging? If we did this, I would rather keep the current .sym artifact unmodified, and introduce a new -short-names.sym artifact or something, otherwise there would be no way to download the "full" names as a .sym if the name-shortening was causing issues somehow. But introducing a new artifact seems somewhat excessive if we're only trying to solve a hypothetical issue preemptively.

Frostbyte0x70 commented 8 months ago

doesn't that mean we would still need to come up with manual short names for every decomp function?

Not necessarily. It would be optional, so anyone who needs it would be welcome to propose short aliases for functions that might need them.

I was considering some sort of dynamic rule-based renaming scheme during generation

Not sure if it would work. On top of the name collisions you brought up, there's also some function names that don't make much sense without the prefix. (e.g. SoundEnvelope_Release -> Release). Those will require some kind of manual work, which is why I prefer the manual approach.

Have you been having problems with any of the function names, besides the ones that were reverted in #240?

No, not really. But what if I (or anyone else) does in the future? What if more functions with long names are added? It's not a problem until someone happens to research an area that has these kind of names and suddenly we need to do something about them. I'd rather deal with the problem now, specially if we can find a good compromise solution.

I would rather keep the current .sym artifact unmodified, and introduce a new -short-names.sym artifact or something

Yeah, I imagined something like that. That's a good approach, I think.

UsernameFodder commented 8 months ago

No, not really. But what if I (or anyone else) does in the future? What if more functions with long names are added? It's not a problem until someone happens to research an area that has these kind of names and suddenly we need to do something about them. I'd rather deal with the problem now, specially if we can find a good compromise solution.

I'm going to defer on this then. It's not clear to me whether this will actually be a problem in practice, and what the best solution would be if so. I would need more real world examples before committing to anything. If this starts coming up for real while you're doing something, feel free to file an issue about it. If there is some problematic case (I think this will be uncommon, see below), we don't have to rush to fix it because there are at least temporary workarounds:

After playing around in No$ myself, here's why I'm guessing issues won't be very common:

  1. You can horizontally expand the disassembly pane in No$ to accommodate longer symbol names, so the bl lines being truncated doesn't seem like a very big issue to me, unless you're working on a very small screen.
  2. A name needs to be more than 38 characters to interfere with the goto command, or a shared common prefix needs to be longer than 34 characters to interfere with searching for bl commands. This seems fairly uncommon, with the longest function in this commit being SoundEnvelopeParameters_CheckValidity, which is 37 characters.
  3. If you aren't working in code that references any functions with long names, it's not an issue.
Frostbyte0x70 commented 8 months ago

Oh well, I guess we'll see how it goes. If this naming style is mostly used for library functions, maybe it won't be as much of an issue.