FractalFir / rustc_codegen_clr

This rust compiler backend(module) emmits valid CIL (.NET IR), enabling you to use Rust in .NET projects.
MIT License
1.51k stars 35 forks source link

Added support for indexing slices/arrays #15

Closed karashiiro closed 1 year ago

karashiiro commented 1 year ago

Closes #13.

This adds support for getting/setting values at indices in slices and arrays. Some of the stdlib syntax is a bit awkward right now because of some instructions expecting TypeDef and others expecting DotnetTypeDef, but I figured that could be handled in a future PR.

For the slice methods, I just wrote a reference implementation in C# (without auto-properties or StructLayout, for simplicity) and copied over the .NET output.

Codegen comparisons

Reference:

internal unsafe struct RustSlice<G0> where G0 : unmanaged
{
    private G0* _ptr;
    private nuint _length;

    public nuint Length => _length;

    public G0 this[nuint offset]
    {
        get => _ptr[offset];
        set => _ptr[offset] = value;
    }
}

.NET 7.0.401:

.class private sealed sequential ansi beforefieldinit
  RustSlice`1<valuetype .ctor (class [System.Runtime]System.ValueType modreq ([System.Runtime]System.Runtime.InteropServices.UnmanagedType)) G0>
    extends [System.Runtime]System.ValueType
{
  .custom instance void [System.Runtime]System.Reflection.DefaultMemberAttribute::.ctor(string)
    = (01 00 04 49 74 65 6d 00 00 ) // ...Item..
    // string('Item')
  .param type [1] /*G0*/
    .custom instance void System.Runtime.CompilerServices.IsUnmanagedAttribute::.ctor()
      = (01 00 00 00 )

  .field private !0/*G0*/* _ptr

  .field private native unsigned int _length

  .method public hidebysig specialname instance native unsigned int
    get_Length() cil managed
  {
    .maxstack 8

    // [42 28 - 42 35]
    IL_0000: ldarg.0      // this
    IL_0001: ldfld        native unsigned int valuetype RustSlice`1<!0/*G0*/>::_length
    IL_0006: ret

  } // end of method RustSlice`1::get_Length

  .method public hidebysig specialname instance !0/*G0*/
    get_Item(
      native unsigned int offset
    ) cil managed
  {
    .maxstack 8

    // [46 16 - 46 28]
    IL_0000: ldarg.0      // this
    IL_0001: ldfld        !0/*G0*/* valuetype RustSlice`1<!0/*G0*/>::_ptr
    IL_0006: ldarg.1      // offset
    IL_0007: conv.u8
    IL_0008: sizeof       !0/*G0*/
    IL_000e: conv.i8
    IL_000f: mul
    IL_0010: conv.u
    IL_0011: add
    IL_0012: ldobj        !0/*G0*/
    IL_0017: ret

  } // end of method RustSlice`1::get_Item

  .method public hidebysig specialname instance void
    set_Item(
      native unsigned int offset,
      !0/*G0*/ 'value'
    ) cil managed
  {
    .maxstack 8

    // [47 16 - 47 36]
    IL_0000: ldarg.0      // this
    IL_0001: ldfld        !0/*G0*/* valuetype RustSlice`1<!0/*G0*/>::_ptr
    IL_0006: ldarg.1      // offset
    IL_0007: conv.u8
    IL_0008: sizeof       !0/*G0*/
    IL_000e: conv.i8
    IL_000f: mul
    IL_0010: conv.u
    IL_0011: add
    IL_0012: ldarg.2      // 'value'
    IL_0013: stobj        !0/*G0*/
    IL_0018: ret

  } // end of method RustSlice`1::set_Item

  .property instance native unsigned int Length()
  {
    .get instance native unsigned int RustSlice`1::get_Length()
  } // end of property RustSlice`1::Length

  .property instance !0/*G0*/ Item(native unsigned int)
  {
    .get instance !0/*G0*/ RustSlice`1::get_Item(native unsigned int)
    .set instance void RustSlice`1::set_Item(native unsigned int, !0/*G0*/)
  } // end of property RustSlice`1::Item
} // end of class RustSlice`1

rustc_codegen_clr:

.class public RustSlice<G0> extends [System.Runtime]System.ValueType{
    .field public !G0* _ptr
    .field public native int _length
.method public hidebysig instance native uint get_Length(valuetype RustSlice<!G0>){
    .locals (

    )
    ldarg.0
    ldfld native uint valuetype RustSlice<!G0>::_length
    ret
}
.method public hidebysig instance !G0 get_Item(valuetype RustSlice<!G0>,native uint){
    .locals (

    )
    ldarg.0
    ldfld !0* valuetype RustSlice<!G0>::_ptr
    ldarg.1
    conv.u8
    sizeof !G0
    conv.i8
    mul
    conv.u
    add
    ldobj valuetype !0
    ret
}
.method public hidebysig instance void set_Item(valuetype RustSlice<!G0>,native uint,!G0){
    .locals (

    )
    ldarg.0
    ldfld !0* valuetype RustSlice<!G0>::_ptr
    ldarg.1
    conv.u8
    sizeof !G0
    conv.i8
    mul
    conv.u
    add
    ldarg.2
    stobj valuetype !0
    ret
}
}
FractalFir commented 1 year ago

Looks OK. Quick note: there may be no need to constrain slices to only managed types. The restrictions on pointer types have been greatly loosened in C#11, and they seem to allow for things such as pointers to a local variable holding a reference to a managed type. If I understand the changes correctly, something like this is now legal.

public unsafe void Test() {
        object a = new object();
        object b = new object();
        if(a != b){
            Console.WriteLine("References not equal!");
        }
        object* aptr = &a;
        *aptr = b;
        if(a == b){
            Console.WriteLine("Reference changed using pointers!");
        }
   }

Not restricting slices would enable us to hold slices of stack-allocated Rust arrays of managed objects:

fn test(){
    let a:[MString;4] = ["A".into(),"B".into(),"C".into(),"D".into()];
    print_mstrings(&a)
}
fn print_mstrings(slice:&[MString]){
    for mstring in slice{
       system::console::Console::wrtienln_mstring(mstring);
    }
}

Allowing for such uses could be quite benefitial.

karashiiro commented 1 year ago

That's right, I only blindly added the constraint because Rider gave me a warning for taking the pointer of a managed object. I can see the value in this for sure, though we'd need to be sure everything is pinned before calling a method like that (we'd need to do that either way actually).

To be honest, however, I'm not sure if the unmanaged generic constraint exists at runtime, given that it just compiles down to an attribute. It might only exist for IDE warnings and be otherwise treated as struct.

At any rate, agreed - it's worth considering not constraining it at all.

FractalFir commented 1 year ago

Correct me if I am wrong, but are not managed object pointers equivalent to ref? They use the same instructions, and I have had some object pointer operations decompile back to references. I believe the official documentation even sometimes refers to ref as managed pointers. Since I plan to forbid using raw managed references outside the stack, there will not be any need for pinning, since GC will be aware of all pointers and handle them appropriately.

karashiiro commented 1 year ago

As far as I'm aware, they're not necessarily the same, although they can be. For one thing, pointers to managed objects can be cast arbitrarily and have the actual value of their pointers stored, at which point there can't possibly be any GC-safe way of using them. I've personally written code that abused ref/pointer conversions before passing things into unmanaged code and gotten burned when generation 2 GC happened.

With that said, I wouldn't be surprised if pointers to managed objects could be replaced with ref automatically in certain cases (and ref is GC-safe), but I certainly wouldn't rely on that. If there is compiler magic there, it probably requires that the pointer is known statically to never be cast to another type or be moved to the heap. As a trivial case in which a pointer to a managed object would not be GC-safe, consider an unmanaged function that takes a managed pointer and returns that exact same pointer (sort of like black_box). Given that the compiler can't know what the unmanaged function did with that pointer, it wouldn't be able to confirm that those conditions are met under any circumstances.

karashiiro commented 1 year ago

As a somewhat more relevant example (that doesn't reference unmanaged code), a simple unsafe cast to nint and back can't be handled by the runtime, as far as I know.

FractalFir commented 1 year ago

Transmutes to and from managed types will be forbidden by the backend, it is not something that is implemented yet, but it should not be that hard (most of the questions are around the issue of error messages).

In the case of the backend, we won't be dealing with unmanaged code for the most part (one of the objectives of the project is to have managed rust code). Additionally, you already need to be quite careful when dealing with FFI in Rust (you should not, for example, pass a Box<T> trough FFI), so we will be only extending the preexisting safety rules of rust to include more types.

Since using managed types outside the stack will be already forbidden in Rust code, we can guarantee that a managed pointer points only to the stack. If in projection.rs we forbid getting addresses of fields belonging to managed types (by always getting it by-value), it becomes impossible to have a managed pointer pointing to the managed heap.

I think as long as we can guarantee that managed pointers point only to variables living on the stack, no issues should arise.

FractalFir commented 1 year ago

There is a bug with the current slice get_Item and set_Item implementation. They generate

ldobj valuetype !0
// And
stobj valuetype !0

instead of

ldobj !0
// And
stobj !0

I am working on a fix.

FractalFir commented 1 year ago

Fixed in 6956a5d75e2f1cb6436d160ba39e8ff108451b42. The bug turned out to be quite easy to fix: LDObj and STObj only took DotnetTypeRef as the object to load/Store. Changed to Type, now all edge cases should work out fine.