curl / curl-for-win

Reproducible curl binaries for Linux, macOS and Windows
https://curl.se/windows/
MIT License
686 stars 207 forks source link

Build official curl binaries with PSL #63

Closed dfandrich closed 8 months ago

dfandrich commented 8 months ago

There is code to do so in this repo, but the binaries at https://curl.se/windows/ do not support PSL. Ref: https://daniel.haxx.se/blog/2024/01/10/psl-in-curl/

vszakats commented 8 months ago

See my comments about the state of PSL in this latest commit: https://github.com/curl/curl-for-win/commit/dd7ce0a9d2d350900abd0f0f565c0dc4c660461b

dfandrich commented 8 months ago

I saw that commit, but it wasn't clear to me what the bottom line was. Most of those issues affect non-Windows builds, too. Is it hesitation to switch to libidn2? Right now, 100% of applications using these binaries are affected by the super cookie vulnerability, whereas by switching to libidn2, some considerably lower percent of applications, possibly even 0%, would be affected by a different issue.

vszakats commented 8 months ago

To support this, libpsl would ideally need to receive WinIDN support and the PSL database a versioned and verifiable package that we can use a dependency like any other. But, it also seems to update very frequently which means a manual curl-for-win update for every one of them, or a need to develop automatic bumps and releases to keep up with it. Or, we rely on the libpsl-bundled copy, which is 1+ year old at this point.

As for build times and binary sizes: libpsl requires a list of heavy GNU dependencies to even load (and/or compile-in) the PSL list: libidn2, libunistring and libiconv. This also means that we need to switch away from WinIDN for Windows, which in turn means that the IDN behaviour diverges for all the other Windows apps that typically use WinIDN. Again something to (re)test, but these libs have a significant effect on binary sizes, even though the PSL list by itself is just 200KB (and could be zipped to 70KB, if libpsl added support for this).

On build complexity: libpsl and its dependencies only support autotools, which is currently broken with Linux MUSL builds and I couldn't fix them despite spending quite many full days on it. At the moment it looks hopeless to make it work. (I've been shifting to CMake where possible, and for now no default dependency needs autotools.)

That said, if someone doesn't mind these issues or wants to play with it, libpsl can be enabled with the -big build option. It's been a while I have tested it though.

vszakats commented 8 months ago

I don't know what the solution is, but carrying this heavy, autotools-based dependency tree doesn't seem like a workable way to me.

Perhaps a stable subset of the PSL built right into curl might improve for "most" use-cases, while clearing almost all of this complexity?

vszakats commented 8 months ago

I was also wondering if the PSL is exposed via a Windows API perhaps (a casual search didn't turn up anything). Surely Windows also needs it, or at least its browsers do. Same for Linux and others, speaking of a list that is almost as critical as the certificate bundle (?), it'd be ideal to have it refreshed and maintained by the OS, also ensuring consistency between apps that need it. But I'm afraid this is just missing as of yet and every app is rolling its own copy.

dfandrich commented 8 months ago

How do Windows-native applications do this? Is there a kind of PSL built-in to Windows like IDN has?

vszakats commented 8 months ago

Good question, I couldn't readily find an API for this. I can imagine this is limited to the browser engine (Edge/Chromium) and maintained just like the HSTS-preload list. (though there is a fresh notion to depart from that static HSTS list: https://groups.google.com/a/chromium.org/g/blink-dev/c/cAS525en8XE).

vszakats commented 8 months ago

I'm also wondering if pre-converting the PSL list to Punycode might help dropping the heavy dept tree. This'd need local tooling and/or support from the PSL list project (to publish it in that format) and also from libpsl. I'm not familiar with Punycode details, so this may be a simplistic view of the topic and not workable. Maybe it's just simpler to add Unicode support to libpsl via native-Windows API (edit: but this won't help non-Windows static builds.)

libpsl also needs to fix the _WIN32_WINNT override issue, but I have no more capacity to report, follow-up and often fix the issues experienced for every dependency. (It's already way beyond any sustainable limit.)

bagder commented 8 months ago

I'm also wondering if pre-converting the PSL list to Punycode might help dropping the heavy dept tree.

To be honest, I think it is a shame that it doesn't do that already.

bagder commented 8 months ago

I filed https://github.com/rockdaboot/libpsl/issues/220 to at least trigger some discussion. I fear the only way that will happen is if we make a new psl lib. I would not be totally against such a move.

rockdaboot commented 8 months ago

I'm also wondering if pre-converting the PSL list to Punycode might help dropping the heavy dept tree.

To be honest, I think it is a shame that it doesn't do that already.

I am not sure what is a shame exactly. The libpsl DAFSA lookup data already contains the punycode AND the unicode/utf-8 strings.

You can can have the lookup data as an external blob, which is loaded very fast on demand (no parsing is needed or done, only file -> memory mapping). This allows to update the PSL lookup data without any rebuild of software. There is a Python script psl-make-dafsa as part of the libpsl repo for converting the PSL list into a DAFSA blob.

The alternative, interning the DAFSA data into libpsl, has the downside that it becomes outdated after a while. BUT, even interned data can simply be overridden by an external blob. So you can use the internal data as "fallback" in case an external file isn't available. There is an SHA1 hash for the internal data, so in theory it allows some basic version control.

From past experience, I'd say that the PSL list evolves relatively slowly and that the big players are all in. So even 1-2 year old lookup data contains 99% the same data as the current list does (the % number is just pulled from thin air, don't rely on it).

If all that is not enough, I am happy to review a PR for WinIDN for additional degrees of freedom.

rockdaboot commented 8 months ago

I was also wondering if the PSL is exposed via a Windows API perhaps (a casual search didn't turn up anything). Surely Windows also needs it, or at least its browsers do. Same for Linux and others, speaking of a list that is almost as critical as the certificate bundle (?), it'd be ideal to have it refreshed and maintained by the OS, also ensuring consistency between apps that need it. But I'm afraid this is just missing as of yet and every app is rolling its own copy.

Updating the PSL is e.g. part of Debian derived Linux distributions. The package name is publicsuffix and the libpsl package is built/configured in a way to load this data when used. In Debian, the publicsuffix package version reflects the date of the PSL list, e.g. 20231001.0357-0.1. I assume that other Linux distros do something similar.

vszakats commented 8 months ago

Thanks for jumping in. I'd be happy to rely on the PSL copy bundled with libpsl, but that (unless I'm missing something) is not possible to compile-in, without pulling in the heavy Unicode dependencies. I'd imagine the smoothest way to fix this would be to bundle a punycode copy and allow to compile-in that one, without the requirement for any Unicode dependency.

It's good news that the list is relatively stable, but relying on a Debian package for versioning sounds limiting and needs a series of hacks (even on Debian, to maintain reproducibility) compared to all other dependencies. These builds also support non-Debian envs (Alpine, MSYS2 and macOS, where dealing with Debian packages is certainly possible in some ways, but a heavy development and maintenance cost).

That still leaves autotools as a major blocker. With most MUSL builds and Windows-llvm (the job used for official binaries) in the red, failing right away in ./configure.

Windows-gcc actually gets to libpsl, after a long list of warnings in libiconv, some in libunistring and libidn2, then stopping in libpsl with these:

<command-line>: warning: "_WIN32_WINNT" redefined
<command-line>: note: this is the location of the previous definition
../../tools/psl.c: In function 'time2str':
../../tools/psl.c:93:13: warning: implicit declaration of function 'localtime_r'; did you mean 'localtime_s'? [-Wimplicit-function-declaration]
   93 |         if (localtime_r(&t, &tm) != NULL)
      |             ^~~~~~~~~~~
      |             localtime_s
../../tools/psl.c:93:34: warning: comparison between pointer and integer
   93 |         if (localtime_r(&t, &tm) != NULL)
      |                                  ^~

https://github.com/curl/curl-for-win/actions/runs/7512934079/job/20454167810#step:3:11661

The complete run: https://github.com/curl/curl-for-win/actions/runs/7512934079

Based on the jobs working, the size cost is 400-500KB for the curl tool binary, 1MB for libcurl shared lib. These do NOT include the PSL bundle itself (as I've found out earlier this week it's not included by default and I haven't updated the scripts to do it), which would be an extra 200KB, plus an extra 4.5MB of static libs (of which libpsl is just 77KB). This applies to all operating systems. Using WinIDN would only help the Windows binaries, but not the statically linked Linux and macOS ones.

Using a punycode list would help to reduce the overhead to about 250-300KB for curl/libcurl and static libs, and cut build times to a reasonable amount. Having an option to zip PSL up, could cut the size in half, making it almost invisible.

(Maybe an option would be useful to disable any kind of loading-from-disk functionality in libpsl. This would likely fix the localtime_r error too.)

But then autotools remains a blocker.

rockdaboot commented 8 months ago

This is several issues at once, let's get at them one by one :)

The build issue you mention has been fixed by https://github.com/rockdaboot/libpsl/pull/193 and https://github.com/rockdaboot/libpsl/pull/195. Maybe I should raft a new release...

rockdaboot commented 8 months ago

It's good news that the list is relatively stable, but relying on a Debian package for versioning sounds limiting and needs a series of hacks (even on Debian, to maintain reproducibility) compared to all other dependencies. These builds also support non-Debian envs (Alpine, MSYS2 and macOS, where dealing with Debian packages is certainly possible in some ways, but a heavy development and maintenance cost).

Oh yeah. I was mentioning Debian only as an example how it could be done and how some distros do it. I assume there are several ways to achieve the same, but I know not enough about your goals to make concrete suggestions here.

rockdaboot commented 8 months ago

Now there is a new release for you which fixes several build issues: https://github.com/rockdaboot/libpsl/releases/tag/0.21.3

rockdaboot commented 8 months ago

If you are unhappy with autotools, libpsl also supports meson and nmake. If there are issues with those, please create an issue (I don't/can't maintain these parts of the code).

rockdaboot commented 8 months ago

To build libpsl without dependency to libidn, libidn2 or libicu, use the autotools configure flag --disable-runtime. With this (on Linux, after make install):

$ ldd /usr/local/lib/libpsl.so.5.3.5
        linux-vdso.so.1 (0x00007fff6cd0c000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa886bc2000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa886dd7000)

So no further dependencies, except for libc.

rockdaboot commented 8 months ago

Regarding the size considerations, a stripped version of libpsl with PSL lookup data included (with punycode):

-rwxr-xr-x 1 root root 71768 Jan 13 19:29 /usr/local/lib/libpsl.so.5.3.5

Without PSL data included (./configure --disable-runtime --disable-builtin):

-rwxr-xr-x 1 tim tim 18520 Jan 13 19:33 src/.libs/libpsl.so.5.3.5
vszakats commented 8 months ago

Thanks, for the release!

[ Regarding meson, it seems way too much effort to start fitting it into all the build cases on all supported platforms. Also something to learn from scratch, then continue testing and maintaining. CMake would be a nicer path IMO. I don't like it either, but it's widespread, portable and works reasonably well. nmake is Windows/MSVC only. ]

As for the new release, it's missing a .sig and uses a different release name/format than previous ones, now also needing autoreconf. Then autoreconf fails in my local test:

[...]
configure.ac:4: installing 'build-aux/missing'
automake: error: cannot open < gtk-doc.make: No such file or directory
autoreconf: error: automake failed with exit status: 1

This patch should fix the _WIN32_WINNT issue. These days mingw will target newer Windows versions by default, so this trick is unnecessary in most cases. Someone using an ancient mingw version can add the necessary _WIN32_WINNT value. The other solution is to extend the script to only add it if it's not already present in CPPFLAGS, but that seems overkill to me.

--- configure.ac-ori
+++ configure.ac
@@ -304,8 +304,6 @@
   case "${host_os}" in
     # MinGW / Windows
     *mingw*)
-      # Select Windows NT/2000 and later, for WSAStringToAddressW()
-      CPPFLAGS="$CPPFLAGS -D_WIN32_WINNT=0x500"
       # Needed for network support
       LIBS="$LIBS -lws2_32"
       ;;

A third solution is to omit code using WSAStringToAddressW() (or all sockets code) when targeting ancient Windows versions. Or a build option to disable anything socket-related.

eli-schwartz commented 8 months ago

From the linked commit;

obtaining a fresh PSL database means another build-time dependency: Even though the PSL has security/privacy implications, its releases are missing versioning, a hash and also a signature, making it tedious to package it verifiably and reproducibly. The libpsl project rejected these raised issues in 2016: https://github.com/publicsuffix/list/issues/31

This feels a bit confusing as it's linking to the publicsuffix project and then claiming that libpsl was the project that rejected the issue.

...

[ Regarding meson, it seems way too much effort to start fitting it into all the build cases on all supported platforms. Also something to learn from scratch, then continue testing and maintaining. CMake would be a nicer path IMO. I don't like it either, but it's widespread, portable and works reasonably well. nmake is Windows/MSVC only. ]

Interesting that you "don't like it either" but nonetheless want to use it... a ringing endorsement.

Anyways I don't understand why you need to learn it from scratch and maintain it to boot. It's a build system, your interface to it is precisely no more and no less than executing its CLI interface, save for the case where you decide to contribute modifications upstream.

And it's widespread, portable, and works reasonably well. :)

rockdaboot commented 8 months ago

As for the new release, it's missing a .sig and uses a different release name/format than previous ones, now also needing autoreconf. Then autoreconf fails in my local test:

Sorry, I was too much in a hurry yesterday and had to release https://github.com/rockdaboot/libpsl/releases/tag/0.21.5, now including the assets (tarballs with signature).

The autoreconf issue stems from the fact that Github generates its own asset, which is not a tarball. This is often confusing to users. Building from the tarballs doesn't require autoreconf (that even may break things for you).

vszakats commented 8 months ago

@eli-schwartz Ultimately it's not about what I like, but what works (or can be made to work) for the cases interesting for this project. I managed with CMake, with autotools it seems borderline impossible at the moment. That said anybody is welcome to contribute fixes. Taking up meson is out of my capacity. It also isn't supported by any other dependency here. Also I'm not here to start yet another flame war, this time about build systems.

libpsl/psl was a typo I corrected earlier: 0759a5d9b04c91e95b945bc5bb83bdad61b8ae8f

vszakats commented 8 months ago

@rockdaboot: Thanks, looking at it.

rockdaboot commented 8 months ago

Regarding _WIN32_WINNT: I'd rather prefer https://github.com/rockdaboot/libpsl/issues/226 That also removes a runtime dependency on Windows (ws2_32.dll).

vszakats commented 8 months ago

@rockdaboot Agreed, dropping the ws2_32 depedency would be best.

eli-schwartz commented 8 months ago

Taking up meson is out of my capacity. It also isn't supported by any other dependency here. Also I'm not here to start yet another flame war, this time about build systems.

What's a flame war here? You said that cmake is better than autotools because you cannot get autotools to work -- I agree that autotools can be very hard to get working, in particular on Windows. I then said that meson is also better than autotools, because meson also supports Windows officially.

Is it a flame war to say that executing the CLI interface of a build system doesn't require you to be an expert in that build system, thus you shouldn't be too worried about "maintaining" it? (This argument of mine is independent of build system used. There's a lot of build systems out there that are a heck of a lot more obscure than the big 3: autotools, meson, cmake. If running them produces the correct files and you don't want to worry about maintaining the build system, you shouldn't go overthinking it.)

Please. Give it a try. Just try running it the simplest way you can, and report your success or failure here. By good fortune, it turns out that I'm both the maintainer of the libpsl meson.build, and a developer of meson itself. You have privileged access to a lot of good advice, should you need it.

You don't have to feel that your lack of status as an expert meson maintainer (?) is a blocker for depending on a bit of software that has a meson build system.

vszakats commented 8 months ago

@rockdaboot: Local build worked, the database is embedded now, lib size small, even with PSL included. Thanks for these!

Issues remaining:

I've merged the version bump and related updates, also dropping libunistring and libiconv. PSL can now be enabled with the -psl config option.

rockdaboot commented 8 months ago

@vszakats Congrats, that is cool :)

(no IDNA support). This seems unnecessarily scary for this build configuration.

I think this isn't scary, just a hint that the application needs to do IDNA stuff on its own. From the libpsl standpoint, there should be a hint. Feel free to suggest a wording that is more on point.

rockdaboot commented 8 months ago

Regarding the failed builds: I don't an autotools issue here. It is more likely that the setup is wrong, e.g. missing cross-compiler and/or the output can't be executed on the host and thus you get

configure: error: C compiler cannot create executables
See `config.log' for more details

Take a look into config.log.

I am not a cross-compilation expert and won't be much of a help here.

vszakats commented 8 months ago

@eli-schwartz: Thanks!

I have given meson a try with libssh2 and it didn't work on a bare macOS build case, but this might have been entirely my fault. Otherwise meson gave a good impression. [Though given a choice I'd prefer a tool that does the whole build without calling out to another build tool such as gnu make, ninja or nmake. (I wrote one such called hbmk2, but it's unknown outside of its specific niche.)]

What I meant by "maintaining", isn't about meson itself, but maintaining curl-for-win scripts to work with meson. curl-for-win grew fairly complex, esp since extending it to non-Windows platforms. These need to be made work with meson and then each global update applied to now not two (plus some one-off ones, like OpenSSL/zlib/trurl) but three build systems. Each tooling dependency adds up to the necessary effort. It's certainly possible to do, and would no doubt be interesting, but out of my personal capacity, and for a single dependency it's hard to justify the effort. Maybe dropping autotools support would allow room for something better, but some projects support nothing but autotools (e.g. libidn2/wolfssh.)

That's to say, I have no issues at all with meson in particular and wish the popular build tools worked much saner than they do.

[ It'd help if meson could pick up a source list from Makefile.inc or alike to avoid having to maintain duplicate lists of them. I remember this was something I spent some time on last year, and the closest was wildcard file spec, but that's also problematic so not supported. ]

bagder commented 8 months ago

version saying libpsl/0.21.5 (no IDNA support). This seems unnecessarily scary

I think we should probably consider dropping everything from the libpsl version string after the version number anyway since that extra components are mostly misleading to curl users as they are not used.

vszakats commented 8 months ago

Take a look into config.log.

I've spent many days looking at those logs :( It more or less comes down to libtool knowing better and filtering the options passed to the linker and then trying to trick it to accept things we need. Then find tricks that work universally, or figure out each individual build case-by-case. I gave up.

I think this isn't scary, just a hint that the application needs to do IDNA stuff on its own. From the libpsl standpoint, there should be a hint. Feel free to suggest a wording that is more on point.

Is there any loss of functionality or caveat due to building libpsl the way we do in this particular case (embedded punycode list, supposedly never reaching out to the disk for a copy)? If libpsl works as expected and doesn't need IDNA, I believe the version message is unnecessary. If IDNA could still be needed but missing, I'd prefer a build option to opt-out from those features to have a non-IDNA-dependent libpsl.

eli-schwartz commented 8 months ago

[ It'd help if meson could pick up a source list from Makefile.inc or alike to avoid having to maintain duplicate lists of them. I remember this was something I spent some time on last year, and the closest was wildcard file spec, but that's also problematic so not supported. ]

The best way to do this would be to have a sources.list newline delimited file, and read it into a meson array with

fs_mod = import('fs')
file_contents = fs_mod.read('sources.list')
sources_arr = file_contents.split()

# or one-line
sources_arr = import('fs').read('sources.list').split()

This is roughly equal to cmake's file(READ ...) modulo splitting the contents into a cmake list, which I think requires both regex and converting separators into ;. Also Make's $(shell cat ...) which, um, inherently splits.

I've spent many days looking at those logs :( It more or less comes down to libtool knowing better and filtering the options passed to the linker and then trying to trick it to accept things we need. Then find tricks that work universally, or figure out each individual build case-by-case. I gave up.

I feel your pain. :(

My understanding is it essentially whitelists the "allowed" flags, and there aren't many good options other than carefully applying a series of patches to the libtool file itself, making it agree to pass through additional flags unmodified. See e.g. https://gitweb.gentoo.org/proj/elt-patches.git/tree/patches

vszakats commented 8 months ago

@eli-schwartz Thanks for your suggestion for the filelist problem, I made a note of it! (Probably a next excercise would be figuring out how to read that up in autotools, but that I'd leave for another day :)

vszakats commented 8 months ago

Solved the blocker by doing a manual build.

This turned out to be shorter (and quicker) than calling autotools.

The price is more maintenance here downstream if something breaks this upstream. I also discovered that symbol visibility isn't set to hidden by default in psl autotools builds; the manual method fixes this also.

Thanks everyone!

eli-schwartz commented 8 months ago

https://github.com/rockdaboot/libpsl/blob/ba4c49a205408845834b1648ad23c15ee09f722b/include/libpsl.h.in#L48-L58

The autotools build setup checks if -fvisibility=hidden is a valid compiler flag for this source code: https://github.com/rockdaboot/libpsl/blob/ba4c49a205408845834b1648ad23c15ee09f722b/m4/visibility.m4#L55-L70

I wonder why neither the MSVC fallback nor the visibility attribute are detected.

vszakats commented 8 months ago

HAVE_VISIBILITY=1 got detected in my local Windows autotools cross-build sample build. Then this enables the "default" visibility attribute in source, which then ignores -fvisibility=hidden passed explicitly via command-line. (It's head-spinning!)

eli-schwartz commented 8 months ago

Oh, I see the confusion.

The -fvisibility flag sets the default for unadorned symbols.

The "visibility" inline attribute adorns symbols explicitly, and in this context, "default" is a manual pointer back to the behavior which would occur if neither -fvisibility nor the "visibility" inline attribute was used, namely, external linkage. With some caveats regarding whole program usage.

...

In general, it's expected behavior for libraries to provide their public API via external linkage.

It's actually necessary in cases such as building a "fat library" for platforms where you're only allowed a single shared library, or the cost of loading many libraries is prohibitively expensive -- so instead you create a single library that exports the ABI of an entire sdk. I'm told this is important for the apple appstore, for example.

Using -Wl,--exclude-libs,ALL when linking a library can tell binutils ld, when it needs to include a *.o file from a static library to resolve a symbol, to hide all symbols from that object file (specifically, all symbols that came from a static library).

The distinction here boils down to:

vszakats commented 8 months ago

Yes, many combinations can be valid depending on use-case. It's useful to have this as a build option or leave it customizable. Speaking of libcurl shared lib, we expose the libcurl API only, while statically linking all dependencies. This needs a variety of tweaks when building them. One special hurdle is assembly code because it doesn't obey -fvisibility=hidden.

All in all a linker option to select the exported function names with a mask/wildcard could make things easy in many cases.

eli-schwartz commented 8 months ago

Well, using --exclude-libs gets you granularity at the project (*.a) level, which I think gets you most of the way there. Could be useful regardless of any other factors.

vszakats commented 8 months ago

--exclude-libs is getting there (thanks for the tip! I didn't know about this option), but setting a second copy or the liblist in a slightly different/specific format seems to make things complicated. Also, Apple ld is missing this option. BTW autotools' libtool has -export-symbols-regex regex, which would be perfect, but it's autotools only.