FractalFir / rustc_codegen_clr

This rust compiler backend(module) emmits valid CIL (.NET IR), enabling you to use Rust in .NET projects.
MIT License
1.39k stars 30 forks source link

Assembly export API - Classes #4

Closed FractalFir closed 3 months ago

FractalFir commented 1 year ago

This issue is related to the design of the Assembly Export API, and the new CLI exporter. There will also be a ILASM exporter, created manly for manual debugging. What needs to be supported:

What is nice to have, and could potentially help (In order of decreasing importance):

PROMETHIA-27 commented 1 year ago

How do you think you can get the class string to the MIR? Afaik there's no preservation of any sort of attribute built in in the rust compiler, and it might require a hard fork of the entirety of the compiler rather than just the backend plugin. I'd love to be proven wrong though.

FractalFir commented 1 year ago

While I would like to just use attributes (and will look for a solution involving them), not having them is not the end of the world. The solution is not elegant, and has its own issues, but should work reliably. As with any other interop features, it will be based on "magic" constants. There will be a constant string in the code, like that: const rustc_codegen_clr_internals_derive_from_%CLASSNAME%:&str = "ParrentClassName"; Since I can be pretty sure, anything starting with rustc_codegen_clr_internals_derive_from_ is my "magic" constant (why would anyone else used it as a name?), I can use it to set the parent. So, I look for a rust type named %CLASS_NAME%, check if it is a struct, and set its parent to the value of the constant.

I will take an analogous approach to any other data that I can't pass directly.

I call those constants "magic" because users will never see them. They will be hidden behind a macro, and from the perspective of a user, everything will just work.

There is one issue with this approach: I need to ensure those constants are never removed by the compiler. They might appear dead, and be removed before I can read them. I have an idea on how to prevent that, but I will need to ensure it always works.

FractalFir commented 1 year ago

I would propose this way of passing data about classes:

pub(crate) struct ClassInfo {
    name: IString,
    fields: Vec<(IString, VariableType)>,
    explicit_field_offsets:Option<Vec<u8>>,
    extends:(Option<IString>,IString), //First, optional name of the assembly it comes form, then, type string
    //"Future" stuff, can be ignored for now
    access_modifier:AccessModifer,
    member_functions:Vec<Method>,
    generic_args:Vec<GenericArgument>,
    attribute:Vec<Attribute>,
}

AccessModifer is an enum with values of either Private or Public. Method, GenericArgument, and Attrbute are placeholders and what data they will hold depends on what is needed.

PROMETHIA-27 commented 1 year ago

One idea I have for banning their removal, that I think would work, is #[used] attribute. It forces the item to show up in the final executable, so it can't be DCE'd, but it also must be static, not constant. That's fine for the output though because you'll just not emit those statics into the IL.

That type def info seems a bit small; I tried to write a full one out and it was so long I gave up before I even ran out of the hardcoded bitmask type attributes on a TypeDef. Essentially we'd probably want the same API offered by the type Type in the CLR. Unless this is supposed to be more specific to just classes, and not all types?

FractalFir commented 1 year ago

Used seems like a perfect solution - I don't see a problem with using statics instead of consts.

As for the ClassInfo am sorry, I have described what I was talking about quite poorly. It seems communication is not my strong suit ;).

Currently, I am writing and asking for feedback on the API between rustc_codegen_clr and any CIL emitter. This is the data both the ilasm based emitter and any other will be passed. It can then be further processed by the emitter (or a "glue" layer above it), like, for example, changed into a Type.

So, this is the generic API, which is not aware of any particular assembly creator. And I wanted to know if this generic API could also work with this emitter.

The proposition with the Type API seems more than fine - I can then translate my generic ClassInfo into an emitter-specific Type. I suppose it would also make other projects using this emitter easier.

PROMETHIA-27 commented 1 year ago

ahh, so ClassInfo was meant to be passed into the CIL emitter? That makes sense. Still, more info would likely be desirable. Types can have a lot of details attached.

FractalFir commented 1 year ago

This is just the minimal amount of info needed by rustc_codegen_clr to emit all classes/structs it needs to work. Support for more stuff would obviously be great, but this is the minimum needed to get things going. I assume it would be easier to split the task of creating an assembly exporter into smaller chunks, so I wanted to separate the smallest feature set that would be needed to start testing.

Did not think about other types. I assumed types like float, long, nint are special hard-coded values, so I treated them separately from classes/structs.

A bit of a tangent question: would passing a name of a struct be enough to identify it (for example, in a function signature)? Or should I keep track of things like the order structs are emitted at?

PROMETHIA-27 commented 1 year ago

Structs are just types that inherit from System.ValueType, so yeah, just a name and namespace is sufficient to identify it.

Strictly speaking, I think numeric primitives are implemented as value types, so they're "structs", sort of. But I'm not sure if that means they're "normal enough" to not hardcode.

FractalFir commented 1 year ago

Essentially we'd probably want the same API offered by the type Type in the CLR.

Thinking more about it, I feel like this idea was probably the best one. This way, you could create an API that works well for the emitter. I can then just use whatever API you will end up with, and use a bit of glue code to translate my representation of a type to your one. The overhead would be negligible, compared to gains coming from not having to launch a separate ilasm process to assemble everything.

This way, if there is a need to change something on the codegen side, I can then just change my glue layer - and there is no need to change anything on your part. Likewise, if you wanted to change something, all changes on my side would be in this gule layer.

I feel like the approach I originally proposed was likely the wrong option - it coupled the emitter to the internals of the codegen way too much, making the whole process far harder for a performance gain that would be negligible at best. I was trying to fit the emitter around the codegen, while I should have aimed to fit the codegen around the emitter.

I am sorry for being a bit indecisive - this is the first time I am not working alone, so coordinating work is very new to me.

So, In summary, I believe the API for types you originally proposed is a far better option than what I suggested. Besides being better suited for the job, it also seems like it would be pretty easy to integrate. I suggest we go with it, but if it does not exactly suit you, it can be changed. Let me know what you think.

PROMETHIA-27 commented 1 year ago

That sounds like the best option to me right now, yeah. And it can always be changed if it becomes obvious that there's a better way.