TeX-Live / texlive-infra

Mirror of core TeX Live scripts (installer, tlmgr, texlive scripts)
1 stars 1 forks source link

Feedback on installer scripts #2

Open vadimkantorov opened 1 month ago

vadimkantorov commented 1 month ago

Background: I'm working on busytex - project which compiles tex programs into a single executable in busybox style, so using musl libc / cosmopolitan libc (which produces portable binary executables runnable on any OS, including WebAssembly).

As part of this effort, I was looking into building into my multitool the perl/installer/tlmgr/fmtutil/updmap to be able to bootstrap/manage the TDS. This kind of works. Some problems/feedbacks/suggestions I've found which could be easily improved on the installer scripts level. These are merely pieces of experience I stumbled on while trying to procruste installer script into a portable Perl binary (and runnable on process-less systems).

  1. Put all code calling external processes/shell/system/backticks into Perl functions. Putting such code into functions will allow to stub them and replace with non-process-calling implementations. Currently it's very hard to know what's spawning processes/calling shell as it's all across the code (and perl's convenient backtick syntax kind of promotes this style). So refactoring by placing all such code in a separate file in separate functions will allow to improve the installer script gradually.

  2. Support the ISO files natively without forcing extraction into FS (this can be quite slow). Also supporting .tar.xz/.zip inmemory/inproc extraction (without calling an external tar/xz programs) would probably speed-up installation of the full scheme considerably. Probably for Perl tar/xz now exist as proper modules and ISO files can be read without prior extraction also with a Perl module. This would reduce the working disk space requirements (which is a concern on free GitHub Actions machines)

  3. In general, trying to replace code calling shell/external programs with code using Perl modules. The fewer processes are created, the faster the install goes (and also in some systems like WebAssembly, process creation is either not supported at all or is very badly supported, especially the shell pipes, so being able to do without spawning processes is preferred), especially the full scheme. At least reducing general shell/pipes usage would already be good.

  4. Specify in docs the minimal needed set of Perl modules (and e.g. ways to disable some features). This is needed when building a static perl without relying on PAR/packers. Same for external program dependencies, currently it somewhat exists as for Windows installer all programs are bundled.

  5. Amalgamation distribution / recipe: pasing the installer script + pm files into a single Perl file for simpler distribution. Currently I managed to do it, but it was not very easy.

Doing all this would not be sufficient for running on process-less systems (for that more complex hacks overriding Perl's system calls would be needed and some sort of process emulation which would only work for some packages), but would still improve considerably installer being more hermetic and having simpler distribution and making installer faster.

Hope some of these ideas are helpful :) Thanks :)

kberry commented 1 month ago
Hope some of these ideas are helpful :) Thanks :)

Thank you for the suggestions.

process-less systems

Do you mean installing TL into a browser? I don't understand.

4. Specify in docs the minimal needed set of Perl modules 

Listing them manually would be unmaintainable. Also, the indirect dependencies surely vary across Perl versions.

I see there is a Module::ScanDeps which tries to determine dependencies. https://metacpan.org/dist/Module-ScanDeps

Maybe I'll try it one day, or maybe it will help you in the meantime with automating installation of everything. If you do try it, I'd be interested to hear your experiences.

Same for external program dependencies, currently it somewhat exists

Not as a list.

as for Windows installer all programs are bundled.

The external program dependencies are quite different between Unix and Windows. Perl too, to some extent.

Thanks again, Karl

norbusan commented 1 month ago

Hi @vadimkantorov

thanks for your suggestions and work on busytex, sounds like and interesting albeit very challenging project.

  1. Put all code calling external processes/shell/system/backticks into Perl functions. Putting such code into functions will allow to stub them and replace with non-process-calling implementations. Currently it's very hard to know what's spawning processes/calling shell as it's all across the code (and perl's convenient backtick syntax kind of promotes this style). So refactoring by placing all such code in a separate file in separate functions will allow to improve the installer script gradually.

That is definitely doable, but would hinder readability and maintainability considerable. Our main aim is to be portable to a wide range of OS/arch systems, and these kind of changes would be rather invasive.

I am sure that none of our team (which is practically Karl and me) have time for this kind of considerable refactoring.

  1. Support the ISO files natively without forcing extraction into FS (this can be quite slow).

Do you mean installation directly from the ISO image? While this is possible, a loop mount does the same and is very easy to do, while reading from the .iso file directly would be awfully slow I guess.

Also supporting .tar.xz/.zip inmemory/inproc extraction (without calling an external tar/xz programs) would probably speed-up installation of the full scheme considerably.

That is an interesting idea that might be worth looking at. Getting rid of external deps is definitely something nice. I just fear that we will get a lot of OOM messages due to the size of some of the packages. So then we would need to do a double path approach, small packages in memory, large via filesystem, and that would not help you I guess.

  1. In general, trying to replace code calling shell/external programs with code using Perl modules.

Good idea in principle, do you have proposals on what in particular would be a candidate?

  1. Amalgamation distribution / recipe: pasing the installer script + pm files into a single Perl file for simpler distribution.

We do this for windows, but not for linux. Where do you see the big advantage? But if you have a recipe that works, we can probably build some bundled unix installer, too.

Thanks again for the suggestions

Norbert

vadimkantorov commented 1 month ago

Hi Karl and Norbert! Thanks a lot for responding to my proposal!

Do you mean installing TL into a browser? I don't understand.

Yeah (but also in other musl libc/cosmo libc x-platform envs). In theory, tlmgr could be supported in browser/wasm for many packages if process-spawns could minimized to only "known" process-spawns. Functioning tlmgr in browser is currently not very realistic, but in general, I am striving to make a hermetic single-file binary containing all main tex programs and tlmgr/tds tree installer. Currently I made it work for latex/luatex/xetex/bibtex/makeindex. I also succeeded in "embedding" needed files (a basic-profile TDS) into the binary - this allows to ship a texlive distro as a single (possibly cross-platform when using cosmo libc) statically-linked binary. I see a lot of value in such distribution schemes (even not for in-browser).

I see there is a Module::ScanDeps which tries to determine dependencies.

Yeah, it's a valid workaround (or PAR Packer output which uses ScanDeps under the hood I think)! So maybe just dumping its output somewhere in release docs would be useful! I'll let you know if I succeed using staticperl / PAR for installer/tlmgr. I have a proof-of-concept for manual staticperl-like compilation of Perl and it works. I'll let you know of progress. (I also would like to try it with cosmo libc which should make a single Perl interpreter runnable on Windows/Linux/MacOS with different arches).

The external program dependencies are quite different between Unix and Windows

Are these just tar.exe / xz.exe / lz4.exe / wget.exe / curl.exe bundled in https://github.com/TeX-Live/texlive-infra/tree/master/tlpkg/installer ?

That is definitely doable, but would hinder readability and maintainability considerable.

Maybe there is some misunderstanding, but I was meaning to factor out lines like my $ff = `kpsewhich -progname='$f' -format=tex '$hf'` into some common util file like https://github.com/TeX-Live/texlive-infra/blob/440c0a548251bf6382493aa6431d207fe4c2cf28/tlpkg/TeXLive/TLUtils.pm . It is already done to some extent, but I propose to gradually complete this refactoring, starting from programs which already depend on TeXLive::TLUtils (like fmtutil.pl/updmap.pl).

Do you mean installation directly from the ISO image? While this is possible, a loop mount does the same and is very easy to do, while reading from the .iso file directly would be awfully slow I guess.

Yeah, I mean that path to ISO file to be provided to install-tl script to be used exactly without mount/extraction. mount is nice when it's available, but it requires sudo permissions / OS support typically (missing on Windows). IIUC files in ISO files are stored as contiguous byte sequences (and studying code of https://github.com/erincandescent/lib9660 confirms this), so after constructing an index structure, reading a file is as simple as a seek to a precalculated offset + read of a known number of bytes. For Perl there exists a module https://metacpan.org/pod/VirtualFS::ISO9660 (a single file https://metacpan.org/release/STEVIEO/VirtualFS-ISO9660-0.02/source/lib/VirtualFS/ISO9660.pm) which also seems to do this already. So I think its overhead for getting .tar.xzs from ISO file should be minimal.

That is an interesting idea that might be worth looking at. Getting rid of external deps is definitely something nice. I just fear that we will get a lot of OOM messages due to the size of some of the packages.

I was meaning to propose usage of Perl-package-based tar/xz/lz4/zip extractors to extract packages to file system (instead of tar/xz/lz4/unzip binaries) e.g. by using https://metacpan.org/pod/IO::Uncompress::UnXz and https://metacpan.org/release/PMQS/IO-Compress-Lzma-2.212/source/lib/IO/Uncompress/Adapter/UnLzma.pm. So it should not be a problem with large packages, but still would eliminate process-spawns. Regarding fully in-memory operation, it's interesting and maybe worth measuring the max memory it could take for large packages. As a middle-ground option, using /tmp or /dev/shm as temp in-memory fs destination can be possible (or ideally extraction may be done directly to the finaly TDS path?). At the very least, I propose to move tar/unxz/unzip-calling code to also be moved/encapsulated as functions to TLUtils.pm.

Good idea in principle, do you have proposals on what in particular would be a candidate?

E.g. going over fmtutil.pl / updmap.pl and replacing kpsewhich shell calls with calls to TLUtils.pm function encapsulating this call - then further on e.g. inproc library version libkpsepath/kpsewhich can be used.

Where do you see the big advantage?

I think value of amalgamation for installer/tlmgr is relatively minor. In my recipe I made it work for install-tl.pl, but then I discovered that it rightfully depends on various other scripts like https://github.com/TeX-Live/texlive-infra/tree/master/texlive-scripts, and then I paused this work. My idea is that distributing installer as a single Perl script is nicer than distributing an archive with many scripts (and easier to embed such Perl script somewhere else) and files. But I agree that value of this work is quite minor compared to other points above. Some problems I encountered during amalgamation are file-embedded perldoc strings which prevent simple file concatenation (this can be solved by some script preproc - I'll share my recipe for this) and global value initialization and global code blocks which is not placed within named functions

Going further, having all install-related stuff as Perl functions / Perl scripts (or instead everything in Lua) would be better and more portable than a mish-mash of Perl/Shell/Lua, but also requires manpower understandably :(

vadimkantorov commented 1 week ago

It appears that one would need to use https://metacpan.org/pod/Device::Cdio::ISO9660 to work with TexLive's ISO files, as https://metacpan.org/pod/Device::Cdio::ISO9660 supports long file names in Joliet...

For xz/tar extraction - certainly Perl modules can be used for extraction to temp dir instead of calling tar/xz in subprocesses (and process forks)

A question for current system. Does it work now if all archives like /path/to/extractediso/archive/hyphen-base.r66413.tar.xz are pre-extracted as /path/to/extractediso/archive/hyphen-base.r66413/ (or as /path/to/extractediso/archive/hyphen-base.r66413.tar.xz/ - i.e. the directory named as archive)?