FabricMC / fabric-loader

Fabric's mostly-version-independent mod loader.
Apache License 2.0
612 stars 258 forks source link

[Draft/RFC] Ponderings on a post-Mixin world #244

Open asiekierka opened 4 years ago

asiekierka commented 4 years ago

When we kickstarted Fabric back in 2016, we decided to experiment with using Mixin as a library - however, certain crucial functionality could not be done the way we wanted to, and the project was shelved. In 2018, we decided to scale back our ambitions in order to get the project released - which has been a successful strategy. However, I think it's wise to at least consider using some of these past ideas and experiences for a post-Mixin world.

Now, please don't consider the following as a rant about Mixin. I highly respect Mumfrey's work and coding ethos - this is strictly a difference in design philosophies.

There are many features we wanted to expose for Fabric modders but could not. Mixin is designed to be a "mostly harmless" way to modify the game, primarily developed in the context of the Sponge project. Many of its approaches do not scale to a very differently designed project which Fabric is.

Because of community interest, I would like to propose an alternate approach, one I wanted to implement sometime (presumably this year?) but went on indefinite break before that happened.

Building blocks

In opposition to Mixin's strategy of treating annotated Java classes as the base format for patches, I'd like to propose a modular strategy, based on two key kinds of components:

The key thing to note here is that there's a documented, intermediate format for patches. This allows many things which are not currently possible or would be difficult to implement:

Compiler

The most essential compiler to provide for an MVP is one which matches most of Mixin's existing functionality - that is to say, takes annotated Java classes as input.

Other compilers can be developed - one might have different taste in annotations, one might want to use a domain-specific language, or write their patches in Kotlin. The benefit is that they can be provided as independent Gradle plugins (or CLI tools) without necessiating special support in the Backend.

Backend

This is admittedly the least "developed" part of the concept. What follows is some rough notes from the top of my head:

Release cadence

As this would be one of the biggest breaking changes in Fabric's history, it would be wise to begin exposing it to modders early in a snapshot cycle, on a separate Loader branch. Ideally, the Backend would stabilize by said version's release - in this case, Mixin/AW can simply be removed on that branch altogether. In a less ideal case, the functionality can be backported to a version with both Mixin/AW (now deprecated) and the new backend, and hope things work out in the end.

Now, this short cadence might sound controversial - but, and this is entirely my personal opinion:

I welcome thoughts, and discussion. I also offer mentorship and advice to whoever is willing to work on this, though I do not at this moment have time to work on this myself.

immibis commented 4 years ago

Beware of the second system effect

i509VCB commented 4 years ago

or write their patches in Kotlin

I do recall kotlin adds a bunch of syntactic sugar, how would we work around that for example?

it would be wise to begin exposing it to modders early in a snapshot cycle

The actual transition discussions should not occur until we have a somewhat stable and ready prototype. Leave that discussion to we reach that point and then we can bikeshed to hell

asiekierka commented 4 years ago

@i509VCB This is entirely a draft; as I said, non-standard compilers are not something Fabric has to do themselves, and that's a good thing - Kotlin fans can just figure this part out themselves.

immibis commented 4 years ago

Wouldn't all of these missing features changes be doable by just adding them to Mixin?

Earthcomputer commented 4 years ago

@immibis this is such a different idea to mixin that it may as well be a different project.

asiekierka commented 4 years ago

@immibis A few of them, yes; but they may not be easy to apply upstream, and this much divergence carries a significant maintenace cost. Most of them cannot be done, in my opinion, without a rethinking on the level of writing a new implementation - at which point it would be good to have one which caters to Fabric's specific needs instead of being hacked into a different project.

natanfudge commented 4 years ago

Mixin's design and safety guarantees make it difficult to patch it in a way where certain mixins are pre-applied to the development environment. This, while not useful functionality for an abstraction layer, would be great to have for viewing code in an environment with large amounts of patches one directly interacts with.

As far as I know, there’s nothing preventing applying mixins statically. It just has to be done.

asiekierka commented 4 years ago

@natanfudge We tried to do it in 2016 and kept running into Mixin's many safety checks and other weirdness. We spent weeks on it and it never worked quite right, especially as we had to apply some Mixins statically and some dynamically - that was the pain point.

TheGlitch76 commented 4 years ago

Would the compiled backend code be verbose (as in using English words/abbreviations) and writable by hand, or not at all human friendly (ASCII terms with no meaning like "M109" or straight up byte code)? I think this would have a large effect on how the backend/compilers would be structured.

Runemoro commented 4 years ago

Something that would be really nice (but difficult) is a compiler that generates patches based on bytecode differences.

With something like this, patching a method would be as easy as making the changes in the decompiled vanilla code, recompiling the class, and calling the compiler on the original and modified classes.

LunNova commented 4 years ago

Been working on an alternative to Mixin for some time, but never had enough energy to put enough in to work on it. Figure it's worth mentioning here. Apparently too burned out after TickThreading to ever do anything else successfully :c

JavaTransformer: API for editing bytecode or source with the same code, instead of having to implement things twice

Mixin: Mixin implementation based on JavaTransformer. No reason it can't be statically applied, already is statically applied with a gradle plugin for dev environment

A lot of it's WIP or feature incomplete, and have never even got around to replacing the usage of TickThreading's old XML patcher in TickProfiler. :L

liach commented 4 years ago

For @nallar: imo the tool will be more bytecode oriented. the source-code generation part can just generate some pseudocode, as unlike in other projects where source code is used to apply patches and generate binary patches, we exclusively use source code here as a form of documentation like javadoc. Your utilities like generics would be a bit overkill

LunNova commented 4 years ago

The apply to sourcecode part is somewhat less useful here given there's no decompiled source workspace for the typical fabric setup, it was necessary for developing with forge which this initially targetted. Being able to parse the sourcecode using the same API is rather useful for the applying patches stage, as depending on the patch you're making a two-stage compilation can make things much more convenient.

Still, if you want your dev environment to have attached sources that match what the game will actually launch with, it's faster to apply patches to the source directly than to have to patch classes and decompile.

liach commented 4 years ago

Imo the patches will be bytecode oriented (like tiny mapping format, using bytecode parameter index than source-code method parameter index). It just need to have a simple way to convert the bytecode patching operations to visible patched content (either working code or pseudocode) in source jar. Currently, sponge mixin loads its mixin classes through some convoluted process and makes it unfit to be applied to source code, even as pseudocode

Earthcomputer commented 4 years ago

The fundamental problem of doing anything with source code is that in Fabric, there isn't just one source, different decompilers can produce different sources. Moreover, decompilers can reorder blocks of code for nicer control flow, meaning that method invocation A may appear before invocation B in the source for one decompiler, and after in the bytecode and another decompiler. There are also other similar discrepanices between different decompilations: which loop the decompiler chooses, which local variables are inserted (fabric loader cannot assume there will be LVTs present in the target class).

Runemoro commented 4 years ago

@Earthcomputer The patch generator wouldn't directly compare bytecode. It should first abstract both classes to a higher-lever representation of the code, where compiler choices such as instruction ordering and control flow generation don't matter (see Procyon IR, for example).

Earthcomputer commented 4 years ago

You mean compiler and decompiler choices? It's a good idea but it still sounds infeasible. Let's focus on local variables as one example. The target class is in bytecode, and will have to be decompiled to this IR. The patch will start off in source code, and will have to be compiled to this IR. Now, since LVT information may have already been lost in the target class, you have no option but to rely on local variable indices and sort of guess where they start and end by their type. Now the compiler's choice of which local variables to merge etc is no longer arbitrary and will have to be taken very carefully, especially when this source could have been generated from different decompilers. I'm not sure how possible that is, and I'm sure there will be many corner cases where it won't work.

Runemoro commented 4 years ago

No, there are no variable indices in the IR. The IR is basically a data flow graph, such that any method with the same behavior corresponds to exactly the same IR.

When the IR is generated for the vanilla method, a map between IR nodes and things like variable and instruction indices should be kept so that once the patches to the IR have been determined, they can be converted to patches to the bytecode.

This data is never used to compare the IRs, though, and doesn't even need to be generated for the patched method. It's just used to pull back the IR changes to Java bytecode.

asiekierka commented 4 years ago

Let's not discuss hypothetical alternate compilers too much here, it will derail the subject. What concerns fabric-loader, from a technical standpoint, is two things:

Earthcomputer commented 4 years ago

For the matching and transformation backend, I suggest first translating the bytecode into some IR like Runemoro suggested above, and then adopting an approach similar to Intellij's structural find and replace, except working on this IR rather than on Java source code. This suggestion should be understood keeping intellij's structural find and replace in mind.

If we adopt this approach, there are actually two types of script extensions we could support:

  1. Smarter pattern matching. Given the set of substitution variables matched by the template impose an extra scriptable condition for the match to succeed, perhaps also testing another template elsewhere (e.g. is this method call calling a method containing a call to another method?). These scripts may also be able to add extra substitution variables based on these inputs.
  2. Smarter substitution. Given the set of substitution variables from the match, generate the IR a given substitution should be replaced by. Each variable in the replacement template may reference a different script for this.

Scripts may also take parameters as inputs, to allow them to be reused more flexibly.

There would be several built-in scripts. A simple pattern matching script would control the number of IR nodes the substitution variable can match (analogous to intellij's "count" qualifier). The simplest substitution script would be to simply copy what is matched by a substitution variable.

LemmaEOF commented 4 years ago

I think the fundamental hurdle here is that whatever replaces mixin needs to be just as easy as or easier than mixin. There's no point in getting rid of mixin if the replacement adds more development burden. I think it would be good to have Mixin stick around for one or two MC versions after the new system gets released, just because we don't wanna pull a Forge and break everyone's projects at the same time. Honestly, the holy grail would be a plugin that automatically compiled folks' existing mixin code as ASMR, with the promise that converting fully to ASMR lets them do difficult things infinitely easier. An embrace-extend-extinguish sort of thing.

Mixin's primary advantage is that patches are written in real Java code, just like it'll appear in the applied patch, so you don't need to know how bytecode works. That was always my main hangup with raw ASM and ASMjs; you needed to have a handle on a paradigm of programming completely different from what you're writing the rest of your mod in.

liach commented 4 years ago

So a quick summary here: We have backend (i.e. storage format that applies changes to bytecode/source) and compiler (convert dev-time format to backend)

Mixin:

@nallar's JavaTransformer:

A few things I'd see in our version:

imo @nallar and @Runemoro's plan on super-high abstraction may be somewhat unrealistic imo. We should focus on getting the bytecode right first (which we are already doing for our tiny mapping format), then back onto showing the changes in a human-readable way in sources (pseudocode acceptable)

immibis commented 3 years ago

So to summarize, things Mixin doesn't do that we'd like it to do:

... anything else?

Earthcomputer commented 3 years ago
modmuss50 commented 3 years ago

Mixin is very targgeted in what it can change. Some ideas would benefit from something more dyamic such as:

Hooking all calls to a method, or all classes that implement/extend X class.

dexman545 commented 3 years ago

It has been talked about more generalized targetting, like being able to replace item == Items.BLAH to tag.contains(item)

supersaiyansubtlety commented 3 years ago

My 2 cents as a simple mod dev: It's essential that what can currently be achieved with mixin remain achievable by writing java code. Having to learn 2 different languages just to be able to mod using fabric seems unreasonable.

i509VCB commented 3 years ago

Such a system would have a mixin like dsl as a "front-end". You would use said front-end to make transformations of you wish. The actual built jar would use this system above.

immibis commented 3 years ago

I still think AspectJ does most, if not all of this stuff.

liach commented 3 years ago

Looking at https://github.com/eclipse/org.aspectj I believe aspectj is outdated (stays on Java 9!) and poorly documented; it is probably even less suitable than mixin

Earthcomputer commented 3 years ago

You could probably even make a front-end that takes the mixins we already have and compiles them to the new format... for those who don't want to use a new system.

i509VCB commented 3 years ago

I do believe a mixin implementation on this will be useful for allowing those who haven't switched yet to have some time to migrate

i509VCB commented 3 years ago

Part 0 - Guiding the ship

Before we can write any code for such a replacement to do things like injectors, breaking for loops, adding switch cases, etc we need a way to identify the way to tell any transformation logic WHERE to transform bytecode.

In my opinion Mixin supplies a good starting point to work from with it's selector system.

Within the context of this discussion, a selector is a way to identify a specific opcode within a class file. A selector will find the desired opcode that someone may wish to target. A selector targets the EXACT opcode - there is no need to have to select the opcode in front of a method invocation for example in a transformation which modifies an input parameters.

Though not in scope of this specific message, the selector system intends that the actual injector (such as an Inject, Redirect, Modify*) will use the precisely specified opcode and shift itself forward or backwards as needed in order to apply said transformation.

Selector types

The standard selector types would include things already in mixin such as

There are also other selector types we could consider:

Custom selectors

One thing I know many people would like the ability to do is to create custom selectors. This could be useful for replacing something like loader's current mess for finding the game entrypoint.

If we do decide to support custom selectors, then we would create some sort of interface for allowing the use of custom selectors. Of course how we would register and load these is not up to debate at this point as we have no real code yet to do so. At minimum I believe a selector interface would look like this:

This is not final but an idea

/**
 * Represents the context in which a selector will select an opcode to target.
 * This may be used to consider things such as runtime environment settings or mappings.
 */
public interface SelectionContext {
    /**
     * @returns the mappings resolved used to remap types in a selector.
     */
    MappingsResolver getMappingsResolver();

    /**
     * Gets the environment settings for the transformer.
     * This may be used to specify if for example this transformation is being applied in a development environment.
     *
     * @returns the environment settings for the transformer
     */
    EnvironmentSettings getEnvionmentSettings();
}

/**
 * A selector used to find specific opcodes in a method.
 *
 * Selectors may be used to search for more advanced occurrences of bytecode to represent for example switch blocks, instanceof checks, etc.
 */
public interface Selector<C extends SelectionContext> {
    /**
     * Searches a method and finds a list of opcodes which meet the conditions of this selector.
     * A selector may arbitrarily choose the opcode to select or return no selections if it failed to find the desired selector.
     *
     * @param method the method to search
     * @param context the selection context used to provide data about the selector or mappings for the current context
     * @returns a list of opcodes this selector has selected
     */
    // SelectionPos abstraction may not be considered over just returning a list of immutable insn nodes
    List<SelectionPos> select(ReadOnlyMethodNode method, C context);
}

What if no selections are returned? The injection would determine what to do based on the number of matches or lack thereof selection matches.

Serialization of selectors

Selectors would have to be serialized into a format if we intend to use a text format for representing what a transformation may look like. This can be easily defined with what I would consider "standard" selectors such as Inject, ModifyVariable, etc. However for custom selectors this is not so easy, possibly requiring each registered selector to have a sort of selector parser?

Discussion on discord seems to tend towards a text format which can be more easily updated than a binary format.

liach commented 3 years ago

@i509VCB I think you forgot another relatively important aspect of engineering: Exposing more functionalities than just changing the code.

This is more evident in the design philosophy of original Sponge mixin than our use of bytecode modification. However, this still may be useful, as mods may seek to apply abstraction to game code to make things more versatile.

A possible example: Item stacks refer to the item frames that hold them. Maybe mods want to have custom item-frame-like stuff and also hook up the same way.

My solution is that in such cases, we can create some instruction that widen the type used; instead of an item frame, we extract some higher type (same idea as seen in sponge mixin) and we act on those types instead.

We can do more than what mixin offers. We can also transform fields or methods (such as changing desc/bytecode associated) to support only a selected subset of these operations.

These concepts might sound abstract. I offer you a concrete goal: through these abstraction transformations, we should gradually eliminate the need of all "Fake Player" "Fake World"s. Where these fake players or worlds are supplied previously will now be replaced by wider types (supporting less operations but more portable).

TLDR: This suggests something not mentioned in i509vcb's comment (mixin interfacing like capability) above and does not refute his comment. I suggest something more about types than the code.

i509VCB commented 3 years ago

For now I believe we need to set a strong base to build larger systems upon by first figuring out how we point and shoot at bytecode we wish to modify. It would be interesting though to be able to widen the types - possibly passing an Actor instead of a player in quite a few things. However that would require some discussion to atleast somewhat elegantly integrate it into the other bytecode transformations without designing an entire parallel widening system that works completely separately from other bytecode changes. Of course such a system may require a completely seperate application phase.

Another concept I think others would want is a sort of displace transformation. Essentially it moves the contents of an entire method into another method and turns the original into a sort of proxy. This is something that would be useful for an all encompassing entity damage event - but this would require transformations apply to mods as well.