FabricMC / tiny-remapper

Tiny JAR remapping tool.
GNU Lesser General Public License v3.0
109 stars 65 forks source link

Tiny V2 spec #9

Open sfPlayer1 opened 5 years ago

sfPlayer1 commented 5 years ago

this is not final

Tiny V2 consists of a list of hierarchical sections. Every line starts a new section, whether it continues an existing section is determined by the indentation level. A section's parent is always the closest preceding section indented once less than itself. Accordingly a section ends just before the next line with the same or a lesser indentation level.

The child to parent relationships form the path to uniquely identify any element globally. For example all method and field sections that are children of a class section represent members of the represented class.

Sections need to be unique within their level. For example a specific class may only be recorded once, a comment can't be redefined or the same parameter listed twice.

Example:

tiny    2   0   official    intermediary    named
    someProperty    someValue
    anotherProperty
c   a   class_123   pkg/SomeClass
    m   (III)V  a   method_456  someMethod
        p   1       param_0 x
        p   2       param_1 y
        p   3       param_2 z
        c   Just a method for demonstrating the format.
    f   [I  a   field_789   someField
c   b   class_234   pkg/xy/AnotherClass
    m   (Ljava/lang/String;)I   a   method_567  anotherMethod

Grammar:

<file> ::= <header> | <header> <sections>

<header> ::= 'tiny' <tab> <major-version> <tab> <minor-version> <tab> <namespace-a> <tab> <namespace-b> <extra-namespaces> <eol> <properties>
<major-version> ::= <non-negative-int>
<minor-version> ::= <non-negative-int>
<namespace-a> ::= <namespace>
<namespace-b> ::= <namespace>
<extra-namespaces> ::= '' | <tab> <namespace> <namespaces>
<namespace> ::= <safe-string>

<properties> ::= '' | <tab> <property> <eol> <properties>
<property> ::= <property-key> | <property-key> <tab> <property-value>
<property-key> ::= <safe-string>
<property-value> ::= <escaped-string>

<sections> ::= '' | <class-section> <sections>

<class-section> ::= 'c' <tab> <class-name-a> <tab> <class-name-b> <extra-ns-class-names> <eol> <class-sub-sections>
<class-name-a> ::= <class-name>
<class-name-b> ::= <optional-class-name>
<optional-class-name> ::= '' | <class-name>
<extra-ns-cls-names> ::= '' | <tab> <optional-class-name> <extra-ns-class-names>
<class-name> ::= <conf-safe-string>
<class-sub-sections> ::= '' | <method-section> <class-sub-sections> | <field-section> <class-sub-sections> | <class-comment-section> <class-sub-sections> 

<method-section> ::= <tab> 'm' <tab> <method-desc-a> <tab> <method-name-a> <tab> <method-name-b> <extra-ns-method-names> <eol> <method-sub-sections>
<method-name-a> ::= <method-name>
<method-name-b> ::= <optional-method-name>
<optional-method-name> ::= '' | <method-name>
<extra-ns-method-names> ::= '' | <tab> <optional-method-name> <extra-ns-method-names>
<method-name> ::= <conf-safe-string>
<method-desc-a> ::= <method-desc>
<method-desc> ::= <conf-safe-string>
<method-sub-sections> ::= '' | <method-parameter-section> <method-sub-sections> | <method-variable-section> <method-sub-sections> | <method-comment-section> <method-sub-sections>

<method-parameter-section> ::= <tab> <tab> 'p' <tab> <lv-index> <tab> <var-name-a> <tab> <var-name-b> <extra-ns-var-names> <eol> <method-parameter-sub-sections>
<var-name-a> ::= <optional-var-name>
<var-name-b> ::= <optional-var-name>
<optional-var-name> ::= '' | <var-name>
<extra-ns-var-names> ::= '' | <tab> <optional-var-name> <extra-ns-var-names>
<var-name> ::= <conf-safe-string>
<lv-index> ::= <non-negative-int>
<method-parameter-sub-sections> ::= '' | <var-comment-section> <method-parameter-sub-sections>

<method-variable-section> ::= <tab> <tab> 'v' <tab> <lv-index> <tab> <lv-start-offset> <tab> <optional-lvt-index> <tab> <var-name-a> <tab> <var-name-b> <extra-ns-var-names> <eol> <method-variable-sub-sections>
<lv-start-offset> ::= <non-negative-int>
<optional-lvt-index> ::= '-1' | <lvt-index>
<lvt-index> ::= <non-negative-int>
<method-variable-sub-sections> ::= '' | <var-comment-section> <method-variable-sub-sections>

<var-comment-section> ::= <tab> <tab> <tab> 'c' <tab> <comment> <eol>

<method-comment-section> ::= <tab> <tab> 'c' <tab> <comment> <eol>
<comment> ::= <escaped-string>

<field-section> ::= <tab> 'f' <tab> <field-desc-a> <tab> <field-name-a> <tab> <field-name-b> <extra-ns-field-names> <eol> <field-sub-sections>
<field-name-a> ::= <field-name>
<field-name-b> ::= <optional-field-name>
<optional-field-name> ::= '' | <field-name>
<extra-ns-field-names> ::= '' | <tab> <optional-field-name> <extra-ns-field-names>
<field-name> ::= <conf-safe-string>
<field-desc-a> ::= <field-desc>
<field-desc> ::= <conf-safe-string>
<field-sub-sections> ::= '' | <field-comment-section> <field-sub-sections>

<field-comment-section> ::= <tab> <tab> 'c' <tab> <comment> <eol>

<class-comment-section> ::= <tab> 'c' <tab> <comment> <eol>

Grammar notes:

<tab> is "\t"
<eol> is "\n" or "\r\n"
<safe-string> is a non-empty string that must not contain \, "\n", "\r", "\t" or "\0"
<conf-safe-string> is the same as <safe-string> if <properties> doesn't contain a <property> "escaped-names", otherwise it's a non-empty string further described by <escaped-string>
<escaped-string> is a string that must not contain <eol> and escapes \ to \\, "\n" to \n, "\r" to \r, "\t" to \t and "\0" to \0
<non-negative-int> is any integer from 0 to 2147483647 (2^31-1) inclusive, represented as per java.lang.Integer.toString()

<class-name> once optionally unescaped is the binary name of a class as specified in JVMS SE 8 §4.2.1, nested class identifiers are typically separated with $ (e.g. some/package/class$nested$subnested). Outer names must not be omitted for any namespace.
<method-name>/<field-name>/<var-name> once optionally unescaped is the unqualified name of a method/field/variable as specified in JVMS SE 8 §4.2.2
<method-desc> once optionally unescaped is a method descriptor as specified in JVMS SE 8 §4.3.3
<field-desc> once optionally unescaped is a field descriptor as specified in JVMS SE 8 §4.3.2

<lv-index> refers to the local variable array index of the frames having the variable, see "index" in JVMS SE 8 §4.7.13
<lv-start-offset> is at most the start of the range in which the variable has a value, but doesn't overlap with another variable with the same <lv-index>, see "start_pc" in JVMS SE 8 §4.7.13. The start offset/range for tiny is measured in instructions with a valid opcode, not bytes.
<lvt-index> is the index into the LocalVariableTable attribute's local_variable_table array, see "local_variable_table" in JVMS SE 8 §4.7.13, not to be confused with "index" referred by <lv-index>

Misc notes:

The encoding for the entire file is UTF-8. Escape sequences are limited to the types, locations and conditions mentioned above.

Indenting uses tab characters exclusively, one tab character equals one level. The amount of leading tab characters is at most 1 more than in the preceding line

Sections or properties with unknown types/keys should be skipped without generating an error.

The amount of extra namespaces defined in the header and the amount of names in every extra-ns-*-names definition have to match. They are associated by their relative position, like the mandatory name spaces a and b that are associated by the suffix, e.g. namespace-a covers class-name-a, method-name-a, method-desc-a, var-desc-a, field-name-a and field-desc-a.

Sections representing the same element must not be repeated, e.g. there can be only one top-level section for a specific class or one class-level section for a specific member.

If any variable mapping doesn't specify a lvt index, e.g. due to a missing LocalVariableTable attribute in one of the methods, the property "missing-lvt-indices" has to be added to .

Mappings without any useful names or sub-sections should be omitted.

Comments should be without their enclosing syntax elements, indentation or decoration. For example, the comment /**<eol> * A comment<eol> * on two lines.<eol> */ should be recorded as A comment<eol>on two lines..

Standard properties:

Runemoro commented 5 years ago

I have a comments about the method variable section: <method-variable-section> ::= <tab> <tab> 'v' <tab> <lv-index> <tab> <lv-start-offset> <tab> <optional-lvt-index> <tab> <var-name-a> <tab> <var-name-b> <extra-ns-var-names> <eol> <method-variable-sub-sections>

  1. You can't make both lvt-index and lv-start-offset optional, since only lv-index isn't enough to uniquely choose a local variable. It should be the lv-index that's optional, since the lvt index on its own is enough to uniquely choose the variable (what we're mapping is the name field in the lvt entry at that index).
  2. Why specify lv-index and start-offset at all? All the remapper needs to know is the local variable table index. If some other tool like Matcher needs those, it can just get them from the lvt.
  3. There shouldn't be a var-name-a field at all. With Minecraft it would all be just ☃! And this would mean it's impossible to remap a JAR whose variable names have been changed by something like Stitch's snowman remover.
  4. Maybe there should be a way to add LVT entries (for other games where the LVT is removed during obfuscation).

Here's a suggestion for what the variable section could look like:

<method-variable-section> ::= <tab> <tab> 'v' <tab> <lvt-index><tab> <var-name-b> <extra-ns-var-names> <eol> <method-variable-sub-sections>

liach commented 5 years ago

@Runemoro I here answer some of your questions as I've worked with tinyv2 a bit already.

  1. Local variable start offset is not optional here. It must be at least 0.

  2. Note that local variable table is optional for the vm and can be removed by an obfuscator from classes. As a result, lvt index is optional.

  3. Var name a should be present, although in actual usage, we don't check the var name a when remapping parameters or local variables at all. Otherwise things may just break if we try to flip the default mappings (leftmost namespace) with some other namespace.

  4. The addition of an extra lvt is concerns for the remapper; it is not related to the tinyv2 format itself.

Runemoro commented 5 years ago

Local variable start offset is not optional here. It must be at least 0.

Sorry, misread it.

Note that local variable table is optional for the vm and can be removed by an obfuscator from classes. As a result, lvt index is optional.

If that happens, it's impossible to map it. You also need the end offset and signature if you want to create a new LVT entry.

Otherwise things may just break if we try to flip the default mappings (leftmost namespace) with some other namespace.

You're talking about inverting the mappings (for example, intermediary -> named to named -> intermediary)? It should at least be optional. If the LVT is absent, there is no variable name at all.

The addition of an extra lvt is concerns for the remapper; it is not related to the tinyv2 format itself.

The format is missing the information necessary for a new LVT entry (end offset and signature).

liach commented 5 years ago

Just a side note from Discord:

for (end user) runtime i btw increasingly like the idea of adding an attribute to tiny v2 that guarantees that it's sorted, which would in turn allow a tree api to do sparse loading+binary search

This is actually a very good point

Col-E commented 8 months ago

this is not final

  1. Is there any update on the state of the spec?
  2. If so, is the Mapping IO implementation fully compliant with this spec?
modmuss50 commented 8 months ago

this is not final

  1. Is there any update on the state of the spec?
  2. If so, is the Mapping IO implementation fully compliant with this spec?
  1. I believe what is on this page is correct, however https://fabricmc.net/wiki/documentation:tiny2 has been recently updated so is likely a better source of information.

  2. Yes, mapping-io is a great point of reference