dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.4k stars 196 forks source link

NativeAOT-LLVM: Question: Should we drop WasmImport and use DllImport #2414

Closed yowl closed 7 months ago

yowl commented 11 months ago

We have been discussing the WasmImport msbuild property for Wasm Components and think it would be a good time to revisit this. WasmImport is used to generate the following LLVM:

attributes #2 = { "wasm-import-module"="hellowasi" "wasm-import-name"="reverse" }

which is an attribute added to a function that generates the following wasm (WAT)

(import "hellowasi" "reverse" (func $reverse (type 4)))

Previous and open issues/PRs for this: https://github.com/dotnet/runtimelab/issues/1390 https://github.com/dotnet/runtimelab/pull/1845 https://github.com/dotnet/runtimelab/pull/2410

I don't know that the original decision to not use DllImport was based on anything other than thinking this was not exactly the same as DllImport so rather than cause any problem, I just created something new.

However now is good time, with Wasm component work starting, and wanting to support Mono, to revisit this and in particular look at whether we should switch to DllImport Value for the module name and EntryPoint for the import name (function name)

cc @pavelsavara , @AaronRobinsonMSFT , @dotnet/nativeaot-llvm , @silesmo

pavelsavara commented 11 months ago

Only observation I have about this is that type system is quite limited for the WASM export/import

I'm not sure that using DLLImport would lead users to assume it's business as usual talking to something that lives in the same address space.

cc @SteveSandersonMS @maraf @lambdageek

AaronRobinsonMSFT commented 11 months ago

@yowl Good question. I think ideally we would use DllImport, but it does come with non-trivial UX cost. Source usage boils down to one of the following:

#if WASM
[DllImport("module", EntryPoint = "func")]
#else
[DllImport("libc", EntryPoint = "func")]
#endif
static extern void Func();

or

#if WASM
[WasmImport("module", EntryPoint = "func")]
#else
[DllImport("libc", EntryPoint = "func")]
#endif
static extern void Func();

We already have LibraryImport and JSImport, both of which are what we could call the "modern" way of calling an external component, so a WasmImport does align with that approach.

Using DllImport also carries with it some fields that likely aren't going to make a lot of sense for WASM (for example, PreserveSig). For the LibraryImport scenario, the interop team made an intentional decision to require users to be explicit and not assume intent. I'm hoping we can follow this as it makes things much easier in the long run.

pavelsavara commented 11 months ago

JSImport is more high level as it speaks JavaScript natively. That's not useful in WASI.

SingleAccretion commented 11 months ago

A little recap of existing behaviors.

#if WASM

I am mostly coming at this from the angle of https://github.com/dotnet/runtimelab/issues/2383#issuecomment-1690342227, and quite concerned about making lazy binding work. The user experience of using #if outside /runtime is not good.

pavelsavara commented 11 months ago

Are you also considering exports ?

SingleAccretion commented 11 months ago

Are you also considering exports?

Exports are not as tricky. Is there a reason not to use UnmanagedCallersOnly(EntryPoint = ...) for them?

pavelsavara commented 11 months ago

Exports are not as tricky. Is there a reason not to use UnmanagedCallersOnly(EntryPoint = ...) for them?

They are mirror to the import, no ? My point is that we should have same design/UX for both directions.

AaronRobinsonMSFT commented 11 months ago

JSImport is more high level as it speaks JavaScript natively. That's not useful in WASI.

I don't understand this reply. My statement above stating prior art not implying how they will be used or reused.

Are you also considering exports?

Exports are not as tricky. Is there a reason not to use UnmanagedCallersOnly(EntryPoint = ...) for them?

We should use UnmanagedCallersOnly unless there is a compelling reason to not to. There has been a request for LibraryExport, which is worth considering I think. The counter is the existing use cases for UnmanagedCallersOnly, there is already a fair bit of inertia on that front. This question becomes is the Import/Export split the best way forward or should we continue to rely on UnmanagedCallersOnly? I would prefer the latter unless there is some win to creating a WasmExport attribute.

pavelsavara commented 11 months ago

JSImport is more high level as it speaks JavaScript natively. That's not useful in WASI.

I don't understand this reply. My statement above stating prior art not implying how they will be used or reused.

Agreed. Sorry I was not clear enough. I also like the new LibraryImport UX.

yowl commented 11 months ago

For LibraryImport, https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.libraryimportattribute?view=net-7.0 starts by saying "Indicates that a source generator". As we are not using a source generator but an MSBuild task for WIT (and nothing at all for static linking), I didn't think that was going to be appropriate?

SingleAccretion commented 11 months ago

DllImport is the ultimate low-level mechanism via which calls to foreign (native) code are achieved. In a way, there is no question whether we should use it. We will, the alternative is introducing another "put a GC transition here" attribute, which will not happen.

I see questions about the higher-level experience (which will include marshaling, no doubt) as orthogonal to this discussion. That higher-level mechanism will use DllImport.

Now then, there are two functions that DllImport performs: 1) Symbol resolution. 2) Managed->native transition.

1 is the topic of this discussion, essentially. The main problem is that how to square things with static linking.

The current behavior (in NAOT-LLVM) is as follows: 1) First, a DllImport may be resolved statically. 2) Failing that, a WASM import will be emitted, with the module name controlled via an item group.

The proposed behavior is: 1) First, a DllImport may be resolved statically. 2) Failing that, a WASM import will be emitted, with the module name taken from DllImport.

2410 proposes another flavor of behavior:

1) First, a DllImport NOT "decorated as WasmImport" may be resolved statically. 2) Unresolved DllImports from 1 will be emitted as WASM imports, with the module name taken from WasmImport env module name. 3) A DllImport decorated as WasmImport will always be emitted as an actual WASM import.

jkotas commented 11 months ago

Would be better to avoid intertwining static linking and wasm imports? Something like:

AaronRobinsonMSFT commented 11 months ago

@SingleAccretion That description, coupled with an offline conversation with @jkotas, helps a lot, thanks.

The described proposal from #2410 has a slight confusion. Step (1) assumes no WasmImport, but step (2) is for unresolved (1) with a fallback to assume it has WasmImport. If however the declaration did have a WasmImport then I would assume (3) would be in play.

The modified suggestion at https://github.com/dotnet/runtimelab/issues/2414#issuecomment-1760045247 makes sense to me.

SingleAccretion commented 11 months ago

Step (1) assumes no WasmImport, but step (2) is for unresolved (1) with a fallback to assume it has WasmImport

I believe the description is accurate. The reason is that you don't need explicit import attributes for the fallback. It is achieved by passing a flag to the linker that says "turn all unresolved symbols into WASM imports".

Also, I forgot to mention this, but there is a 3rd kind of linking that exists, which is how Emscripten resolves C symbols to JS-provided library functions. It behaves somewhat like static linking (I believe Emscripten reads the symbol table from object files in python, though that would need to be confirmed), but you will have a WASM import in the final .wasm.

The interplay between all these things is complex. I certainly like the simplicity of "[WasmImport] means WASM import" a lot.

I do not know what the implementation for a scheme like this would look like or how usage looks like for different scenarios.

For example, how would we implement the default DllImport behavior (in the absence of dlsym)?

Say, we have:

[DllImport]
void Import();

First, do we want this to bind statically by default? Say we do.

Here's one sketch for an implementation:

[DllImport]
void Import() -> Import_Stub();

extern "C" Import();
void Import_Stub()
{
    // This will rely on passing --unresolved-symbols=ignore to the linker.
    if (&Import != null)
    {
        Import();
        return;
    }

    throw new EntryPointNotFoundException();   
}

If such a DllImport is not resolved statically, it won't be resolved at all. Say we do not want that. On Browser, we can take advantage of access to the host and code something up in JS. In pure WASM, we would need the optional imports feature. Say we have it. The above becomes:

[DllImport]
void Import() -> Import_Stub();

extern "C" Import();
extern optional wasm-import("Import") Import_Wasm();
void Import_Stub()
{
    // This will rely on passing --unresolved-symbols=ignore to the linker.
    if (&Import != null)
    {
        Import();
        return;
    }
    if (&Import_Wasm != null)
    {
        Import_Wasm();
        return;
    }

    throw new EntryPointNotFoundException();   
}

And so we end up with a WASM import anyway.

AaronRobinsonMSFT commented 11 months ago

Step (1) assumes no WasmImport, but step (2) is for unresolved (1) with a fallback to assume it has WasmImport

I believe the description is accurate. The reason is that you don't need explicit import attributes for the fallback. It is achieved by passing a flag to the linker that says "turn all unresolved symbols into WASM imports".

So (2) only kicks in with this flag? Also the statement "assume it has WasmImport" means that it isn't (1) because (1) explicitly states "NOT 'decorated as WasmImport'" and (3) explicitly says it does so is (2) really a fallback for (3)? I'm not trying to be difficult but I don't see how, even with the aforementioned flag, that sequence makes sense. Basically how can (2) reference (1) and rely on WasmImport if (1) says it isn't present by definition?

Dynamic linking basically does not exist outside of a bespoke Emscripten system I doubt we'll have the appetite to support.

My interpretation of @jkotas's statement was about keeping the door open for a potential future where it is supported.

For example, how would we implement the default DllImport behavior (in the absence of dlsym)?

That is a good example. I think for now, until someone comes up with a dynamic scenario, this would be as you've described - a failure.

SingleAccretion commented 11 months ago

So (2) only kicks in with this flag?

Correct.

Also the statement "assume it has WasmImport" means that it isn't (1) because (1) explicitly states "NOT 'decorated as WasmImport'" and (3) explicitly says it does so is (2) really a fallback for (3)? I'm not trying to be difficult but I don't see how, even with the aforementioned flag, that sequence makes sense. Basically how can (2) reference (1) and rely on WasmImport if (1) says it isn't present by definition?

Sorry, that was a typo on my part. 1 and 2 are explicitly not "decorated with WasmImport". The behavior is that the import will be from the "default" env module. I have corrected the original post.

My interpretation of @jkotas's statement was about keeping the door open for a potential future where it is supported.

Right. It's a question of what the default behavior should be, and what the consequences are. As a reference point, if you use PublishAot + <NativeLib>static</NativeLib>, your DllImports will not be statically resolved by default. So WASM's defaults would be more or less reversed - normal: dynamic, static via opt-in, wasm: static, "dynamic" via opt-in.

AaronRobinsonMSFT commented 11 months ago

Sorry, that was a typo on my part

Perfect. I'm now following the proposal. Thanks!

jkotas commented 11 months ago

So WASM's defaults would be more or less reversed - normal: dynamic, static via opt-in, wasm: static, "dynamic" via opt-in.

We can make WASM to have the same defaults:

normal: dynamic - it can throw PlatformNotSupportedException; static via opt-in - the opt-in is required.

It is a bit of extra work to opt-in all static libraries used by the app. The upside is that it is not necessary to chase down all dynamically unreachable PInvokes that target non-wasm platforms and that would produce unresolved symbols.

yowl commented 10 months ago

Sounds like there is a plan.

These modules (if included) will be marked for direct pinvoke, via Microsoft.NETCore.Native.targets presumably.

libSystem.Native.a
libSystem.Globalization.Native.a
libPortableRuntime.a

The bootstrapper and exceptions modules I don't think are called from managed code so they don't need to be included.

libbootstrapper.a
libbootstrapperdll.a
libCppExceptionHandling.a
libWasmExceptionHandling.a
libEmulatedExceptionHandling.a

WasmImportAttribute will be proposed (in runtime repo?) with no properties, but its presence will create a Wasm import using module name from DllImport From DllImport we will take the module name from the Value property.

Modify https://github.com/dotnet/runtimelab/blob/8f11462546cd0acbf56d9d08ba9c69470ee00d12/src/coreclr/tools/aot/ILCompiler.LLVM/CodeGen/LLVMObjectWriter.cs#L769 to fit, adding logic for unresolved dynamic symbols that are not WasmImports (maybe we can do this bit seperately). Do we need to have the EntryPointNotFoundException path even for WasmImports? Maybe that should be a property in WasmImportAttribute.

Passing --unresolved-symbols=ignore only if any symbols are to be dynamically resolved would be nice I suppose, but perhaps difficult.

SingleAccretion commented 10 months ago

The proposed direction sounds good to me. Concretely:

1) [DllImport("xyz")] will throw PNSE by default. We can make it work "properly" with optional WASM imports in the future. 2) [DllImport("xyz")] + <DirectPInvoke Include="xyz" /> will generate an extern "C" symbol. This would usually be statically linked, but could also used for Emscripten's JS libraries. What this does will be affected by the --unresolved-symbols setting, and we will endeavor not to depend on it for core runtime libraries (i. e. there should not be unresolved symbols in them). I like this a lot - it means <NativeLib>static</NativeLib> will work as-is.

AaronRobinsonMSFT commented 10 months ago

WasmImportAttribute will be proposed (in runtime repo?) with no properties, but its presence will create a Wasm import using module name from DllImport

Yes, please. A minor note that I think is worth considering. Since the WasmImport is going to be used in autogenerated source for the most part, I think it would be advisable to having the first argument be required and match that of the target DllImport. This follows how we do LibraryImport and ensures all uses carry with them. For now it will be superfluous but going forward it is better to have a default that is explicit rather than some assumed value.

Please file an issue and tag me.

The proposed direction sounds good to me.

+1

MichalPetryka commented 10 months ago

I wonder if it'd make sense to also move DirectPInvoke to an attribute on the PInvokes.

jkotas commented 10 months ago

[DllImport("xyz"), WasmImport] + <DirectPInvoke Include="xyz" /> - error?

I do not think we should be reporting an error in this case. It would break in corner-case situation when somebody ends up with a static library that happens to match name of a wasm module.

Since the WasmImport is going to be used in autogenerated source for the most part, I think it would be advisable to having the first argument be required and match that of the target DllImport.

It makes sense only if we think that WasmImport is going to be usable without DllImport. What would be the scenario for it? What happens when the names in DllImport and WasmImport are different?

The Import suffix on WasmImport makes it sound like that WasmImport is a replacement of DllImport that it is not the case. WasmImport is augment for DllImport, similar to SuppressGCTransition. I cannot think about a better name, just sharing an observation.

I wonder if it'd make sense to also move DirectPInvoke to an attribute on the PInvokes.

You can use * as the name in DllImport to hardcode it to be DirectPInvoke. It means that the binary is only going to be usable by form factors that link the final binary using native tools.

yowl commented 10 months ago

I cannot think about a better name

I'll make some alternative suggestions in the runtime repo.

AaronRobinsonMSFT commented 10 months ago

Since the WasmImport is going to be used in autogenerated source for the most part, I think it would be advisable to having the first argument be required and match that of the target DllImport.

It makes sense only if we think that WasmImport is going to be usable without DllImport. What would be the scenario for it? What happens when the names in DllImport and WasmImport are different?

The sense is consistency, all other *Import scenarios for calling external dependencies have this mechanic. It also helps clarify usage when it is found, "follow the pattern". Since this is source generated, they should be the same, but if they aren't... I'm thinking outloud here now. I suppose the confusion here is what is using WasmImport. For LibraryImport or JSImport, the source generators are using this the attributes to generate the DllImport, which means they are always the same. In this case though the attributes are being used in a two stage process simply to flow details.

Alright, nevermind. Having the attribute take no arguments makes sense here given how the tool chain operates.

I cannot think about a better name, just sharing an observation. I'll make some alternative suggestions in the runtime repo.

Excellent. I think that would assuage any lingering concerns I would have.

SingleAccretion commented 10 months ago

I do not think we should be reporting an error in this case. It would break in corner-case situation when somebody ends up with a static library that happens to match name of a wasm module.

That makes sense. I suppose WasmImport would take precedence in this case as the more local of the two "modifiers".

Since this is source generated...

I will note that WasmImport will very much be expected to be used manually, like in #2383. You could think of it like an exotic DllImportSearchPath flavor.