Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.99k stars 559 forks source link

[doc] Setting up PerlIO callbacks when embedding a Perl interpreter using C (before any `*.pm` modules were ever loaded or any Perl code executed) #22571

Open vadimkantorov opened 2 months ago

vadimkantorov commented 2 months ago

Hi!

I managed to do a fully hermetic single-file static build of perl via building all modules statically (followed https://perldoc.perl.org/perlembed) and providing my own implementations of open/fopen/read/seek to serve *.pm system files from memory.

Is there a way to hook up to the Perl's own PerlIO layers system to make sure that Perl only calls these functions (including for module/*.pm discovery and loading) and never goes to libc's IO functions or does libc IO function calls / IO syscalls? This would be much cleaner and a more robust solution.

It would be nice if setting up PerlIO in perlembed scenario was covered in docs.

I also wonder how diamond operator is implemented in the code and which functions from https://github.com/Perl/perl5/blob/blead/perlio.c it calls and in what sequence (e.g. for perl -e 'open(f,"<","my.txt");print(<f>);' and for perl -e 'open(f,"<","my.txt");$line=<f>;print($line);')

Thanks!


If anyone's curious to see what my hack looks like - https://github.com/vadimkantorov/perlpack, but it's very much a WIP

My current problem is that overriding open /close / read / stat / lseek / access / fopen / fileno was sufficient for perl -e 'use Cwd;print(Cwd::cwd(),"\n");', so it can successfully discover and load the Cwd.pm file from my virtual read-only FS, but doing perl -e 'open(F,"<","/mnt/perlpack/.../Cwd.pm");print(<F>);' does not work - probably because Perl is trying to do fcntl/ioctl/some other version of stat call and I am not implementing these. In any case, it is currently not invoking the read function for some reason when I'm using the diamond operator because of some failures on the way. Which IO/stdio calls are used by Perl in a typical opening/reading a file? strace shows open -> fcntl -> ioctl -> lseek -> fstat -> mmap -> read, but these are raw syscalls, so I'm wondering what are the concrete libc/stdio IO functions (I imagine this is somewhere in perlio.c or do_io.c but there are quite a few of indirection layers - so hard to parse through by a novice in the perl's codebase) are used by Perl in a typical opening/reading a file (e.g. stat has many variants) - so that I can override them.

tonycoz commented 2 months ago

There is PERL_IMPLICIT_SYS, but that replaces all I/O (not just module loading) and only has a host implementation on Windows.

If you just want modules to be loaded from memory you can add a hook to @INC that checks for a known name and loads that module from memory, see perldoc -f require.

vadimkantorov commented 2 months ago

I'll check out PERL_IMPLICIT_SYS - replacing all I/O is fine for my usecase, as my custom I/O functions only serve from in-memory for some special prefixes like /mnt/perl. Is anywhere any docs / examples of using PERL_IMPLICIT_SYS to override? (and what functions need to be overridden for ensuring both module loading and for perl -e 'open(f,"<","my.txt");print(<f>);'. I'm only concerned for compiling/running on Linux for now.

If you just want modules to be loaded from memory you can add a hook to @INC that checks for a known name and loads that module from memory, see perldoc -f require.

Actually, interested both for modules and for regular, basic file reads. For modules, can such INC-hook be added via C perlembed interface (without executing Perl code)?

and only has a host implementation on Windows.

And regarding PerlIO infra, is it relevant for my usecase (module / *.pm loads and regular basic file reads)? Can it be configured via C perlembed interface? Or would you recommend using PERL_IMPLICIT_SYS? Or is using PERL_IMPLICIT_SYS on Linux impossible?

Thank you!

Leont commented 2 months ago

I'll check out PERL_IMPLICIT_SYS - replacing all I/O is fine for my usecase, as my custom I/O functions only serve from in-memory for some special prefixes like /mnt/perl. Is anywhere any docs / examples of using PERL_IMPLICIT_SYS to override? (and what functions need to be overridden for ensuring both module loading and for perl -e 'open(f,"<","my.txt");print();'. I'm only concerned for compiling/running on Linux for now.

It's only ever been done before for Windows but there's no reason it would be impossible on Linux. See perlhost.h, win32.c and perllib.c in win32/ for prior art.

vadimkantorov commented 2 months ago

Thanks for the pointers! I'll look into what entails using PERL_IMPLICIT_SYS on Linux.

And maybe the last question, if you would know if PerlIO can also be used for this I/O override goal? And if so, can it be configured via a C API before any Perl code gets executed?

tonycoz commented 2 months ago

For modules, can such INC-hook be added via C perlembed interface (without executing Perl code)?

After perl_construct() something like:

CV *hook = newXS("MyPackage::my_hook", \&xs_my_hook_xs, __FILE__);
AV *inc = get_av("INC");
av_unshift(inc, 1);
av_store(inc, 0, newRV_noinc(hook));

You could also define the hook sub in perl with eval_pv()/eval_sv().

And maybe the last question, if you would know if PerlIO can also be used for this I/O override goal? And if so, can it be configured via a C API before any Perl code gets executed?

You might be able to do it by modifying PL_def_layerlist or via PERLIO in the environment, but I've never tried it.

It also won't allow you to hook operations like stat() and fcntl().