Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Nativecall, native trait library usage rework #186

Open Skarsnik opened 4 years ago

Skarsnik commented 4 years ago

Current usage of the is native trait has many issues and often misunderstood. The trait is solved at compile-time and lot of common usage does not work well with that. For example, allowing the user to use an environment variable to specify the library path is not 'easy' to properly workaround

The current parameter to the native trait seems simple at first glance : You give it a name and some magic will make it match a library name according to your system. If you want to do custom thing you gave it a Callable.

Maybe you can think that writing is native(%*ENV<MyLIBPATH>) is fine. It will work when working on a single script since it will be recompiled everytime before being run. Put your code on a module and it will keep the value of %*ENV when it gets compiled, leading to surprises the next time you run it.

The proper solution is to put the ENV in a callable that you give to the native trait. If I remember correctly it will be run multiple time for each function declaration (something like libxml2 has over 100+ functions). And if you want to follow a simple pattern like 'use the system library but let the user force another path with an ENV var, the solution is use NativeCall :ALL; sub foo is native(sub {%*ENV<MY_LIBFOO> || guess_library_name(('foo', v42)) } )

It's even more complicated when you want to provide the library with your distribution. I think it should not be that cumbersome for the user and should not require that much knowledge on how NC work. The Callable option seemed simple and good to handle specials case, but it's actually hard to use it right and a bit too hacky.

It's even more annoying when you have multiple file that use same the same library, you will need a way to share your solution to find the library.

Skarsnik commented 4 years ago

A solution is to try to hide most things from the user and just return a handle you gave to the native trait. This way we can provide a simple interface in NativeCall and let module that wants to do advanced stuff (call pgk-config ?) give a compatible handle.

My solution revolve around a NC-Library-Handle that you get from registering your library with a routine, then you just gave this handle to the native trait.

my NC-Library-Handle $LIBFOO = register-native-library('Foo', 'foo', :version(v42), :ENV<MY_LIBFOO>); 
sub foo is native($LIBFOO);

Behind the scene, the trait will just use the $handle.library-path attribute that will be filled the first time it needed.

More at https://github.com/perl6/6.d-prep/blob/master/deferred-to-6.e/for-review/NativeCall.md and https://github.com/rakudo/rakudo/pull/716

I realised you could also probably just use the name of the library to give to the trait, but it probably fits less with letting modules do their own thing with own they solve the library path.

jnthn commented 4 years ago

About this:

my NC-Library-Handle $LIBFOO = register-native-library('Foo', 'foo', :version(v42), :ENV<MY_LIBFOO>); 
sub foo is native($LIBFOO);

That'd need to be my constant ..., because trait applications happen at compile time.

That small detail aside, I think that the direction of a handle object is worthwhile to explore in, especially since the object can cache details of the resolution work it did easily and share it over the many uses of is native that follow. I think I'd probably prefer it's Something.new(....) or Something.convenient-factory(...) rather than a subroutine.

+1 to the suggested direction, anyway. Especially since it's additive (e.g. won't break existing uses of the is native trait).

Final comment:

Maybe you can think that writing is native(%*ENV) is fine

If that's a common mistake we're seeing, then after my RakuAST work we'd be able to consider making that give a nice compile time warning or even error.

Skarsnik commented 4 years ago

Only the type is needed at compile time so the trait is created correctly, resolution happens at runtime, actually, I don't remember if my implementation is correct for the %ENV. I can't make it run again because I get a weird error with Role/Trait resolution that my PR fixed but it's not applicable with current code (dunno why the trait get moved in a weird EXPORT section)

jnthn commented 4 years ago

Only the type is needed at compile time so the trait is created correctly, resolution happens at runtime, actually,

I don't think so; the variable that exists at runtime is a new instance of the variable, created at the entry to UNIT.

niner commented 4 years ago

On Donnerstag, 7. Mai 2020 00:24:34 CEST Sylvain Colinet wrote:

Only the type is needed at compile time so the trait is created correctly, resolution happens at runtime, actually,

It's not that simple unfortunately. It's quite common for native subs to be called at BEGIN time, e.g. to figure out the correct type to use for some arguments or to test the version/API compatibility of the installed library. The library name will end up in the precompiled file.

There is however a mechanism for re-resolving that library name on deserialization, which I use for avoiding to serialize the full path of the library (because that breaks packaging): https://github.com/rakudo/rakudo/blob/4a3f29db113a/lib/NativeCall.rakumod#L308

This can be easily extended to allow for reading an environment variable or really arbitrary code. So we could have a my constant $lib = NativeCall::Library.from-env('FOO') and have full support for precompilation as well.

With such handles, we may even be able to get rid of the first-call-replaces- the-sub business.

I don't remember if my implementation is correct for the %ENV. I can't make it run again because I get a weird error with Role/Trait resolution that my PR fixed but it's not applicable with current code (dunno why the trait get moved in a weird EXPORT section)

I usually try to write helpful commit messages that explain the reasoning, so a git blame should clear that up quickly.

Skarsnik commented 4 years ago

I am not sure the library path resolution should ever happen at compile time actually, it's only somehow ok in the context of a module that provides the library.

Skarsnik commented 3 years ago

I still live. I think it should be better to list the expected behaviour of lib resolution and agree on these.

1-It should always be at run time (probably BEGIN time). The host env can change (lib update, lib moved, etc...) 2-It should happen once for the same library 3-The default implementation should be good enough to work out of the box and enforce good practice. 4-It should be possible to change/extend it for specials cases (eg: MySQL)

For 1 the only exception is library bundled with the raku script (using the RESSOURCE system) For 3 it's important that the user specify an ABI VERSION number, if it's not possible out of the box they fall back to the 4 category and must implement their own thing. For 4 it could be useful if they have access to the 'default' resolution so they don't have to reimplement it for some OS.

lizmat commented 2 years ago

I have not really looked into this, and my experience with NativeCall basically boils down to using the libc which does not require any lookups.

But from what I understand of the problem, is that once a module that uses NativeCall has been pre-compiled, an update of the underlying native library might cause the module to either fail (previous version removed), or use the wrong version (as the new one is better / has no known vulnerabilities).

If this is so, then it would mean that native library resolution should probably run at INIT time? Or possibly as a custom compiler pass in the future?

Furthermore, it would seem that one would like to be able to change the version of the native library to be used, without needing to make changes to the source code or needing to re-precompile. And possibly, a sys-admin would need to be able to do this without touching any code at all. Which leads me to think we would need some kind of resolution registry / logic? Some META6.json like thing?

Hope this adds to the discussion. If not, then just please ignore my ramblings :-)

niner commented 2 years ago

We cannot restrict library resolution to runtime. It must be available at BEGIN time. E.g. the OpenSSL module needs to make different calls, depending on the version of openssl installed. Inline::Perl5 determines the types to use for signatures - which depend on how perl was compiled - at compile time and the only way to do so is to run some native functions: https://github.com/niner/Inline-Perl5/blob/0d6a233f43b18876660a6e6220184f68fccf934a/lib/Inline/Perl5/Interpreter.pm6#L12

Since the compile time decisions depend on the exact library we load, we cannot defer determining what we load to runtime. On the contrary, we must use the same library at runtime in this case.

Now it's obviously quite convenient to not have to care about such things when you upgrade your system and get new versions of installed libraries. Blindly expecting things to just work will however lead to very unwelcome surprises.

Since the solution here cannot be just deferring library resolution to runtime, it must lie elsewhere. If the library that we compiled with can or should not be used, we have to recompile. If we can make that automatic, we get both the convenience and the reliability. The key to this is factoring in external dependencies when checking whether a precomp unit is up to date. A solution to this is outlined in https://github.com/Raku/problem-solving/issues/159

Skarsnik commented 2 years ago

This could be left as is for the resolution, since it's done the first time you run a sub. But at the same time you probably don't want to be tempted to do stuff like version check at compile time in a module. Inline::Perl need the call at compile time to get the type size. OpenSSL should do it at INIT time, since you don't know if library version get changed between the installation of the module and the usage of a program that need it.

These are those are special case.

Maybe if you use the 'default' handler NC will provide in my idea, this will forbide you to have BEGIN time resolution If you use a lib from a ressource you get another handler since this can be at BEGIN time.

Normal usage.

my NC-Library-Handle $LIBFOO = register-native-library('Foo', 'foo', :version(v42), :ENV<MY_LIBFOO>); #returns a NC-Default-Library-Handle 
sub foo is native($LIBFOO);

BEGIN foo(); # will error like "Can not run a native fonction from a registered library at BEGIN time, use a non standard registration

Ressource usage

my constant NC-Library-Handle $LIBFOO = library-from-ressource(%RESSOURCE<MyLib>); # returns a NC-Ressource-Library-Handle
# or ? my NC-Library-Handle $LIBFOO = register-native-library(:ressource<MyLib>);
sub foo is native($LIBFOO);

BEGIN foo(); # This is fine

The issue with this, I am not sure you get the full type and not just the NC-Library-Handle role at Compile time for the trait.

For Inline::Perl it could be interesting to just be able to Inerit from the default NC-Handle and just set an attribute to allow for BEGIN run.

niner commented 2 years ago

Why have different ways of registering a library when there's a solution that allows for BEGIN time registration with automatic re-compilation when the library gets changed?

Skarsnik commented 2 years ago

This mean having the code always do a SHA/whatever of the underlaying file? The registration thing is more to register the library to NC so the user does not have to always have to respecify the name/version on each is native sub and also make the library sort of a thing and not just random arguments to the trait. And for me an "system library" and a provided library as ressource are really two separate things. You could probably even have interesting fallback mixing both, like using the ressource one if the system does not have it.

Skarsnik commented 2 years ago

I understand now, you are talking about META6.json file where Rakudo can 'find' the lib from. The issue you still get is the whole is native('foo', v1) that is completely unrelated and does not mean much. That why I propose to have only a "handle" you give to the native trait and this handle can come from nicely crafted functions in Nativecall but also from custom made functions made to handle weird cases.

Let's see some cases:

= Prototype code/small app

You could use my proposed register function

my NC-Library-Handle $LIBFOO = register-native-library('Foo', 'foo', :version(v42), :ENV<MY_LIBFOO>);
sub foo is native($LIBFOO);

= Module with Meta

In the Meta file you give an id to the library, then it gets registered in NativeCall.

my NC-Library-Handle $LIBFOO = native-library('Foo', :ENV<MY_LIBFOO>);
sub foo is native($LIBFOO);

I still think all the final native library resolution should only happen at run time (because of stuff like ENV) and the only compile-time acceptable case is when the library is in the resource of the project.