ahgamut / superconfigure

wrap autotools configure scripts to build with Cosmopolitan Libc
The Unlicense
177 stars 24 forks source link

Adding C libraries related to Python's LXML and Pillow #26

Closed croqaz closed 5 months ago

croqaz commented 6 months ago

Hi!

Thank you for this awesome library and all the time you spent maintaining it! I can tell it's a labor of love.

I would like to embed C libraries inside a Python 3.11 build, and I don't know how. I'm writing a public issue in case someone else might want to do the same thing, so maybe we can learn together.

I wonder, would it be appropriate if I add some C libraries in superconfigure ? I care about LXML and Pillow. LXML requires libxml2 and libxslt. Pillow requires libjpeg, zlib, libtiff, libwebp, maybe openjpeg, other optional ones...

I checked the other libs included in this repo and it looks like we need to download the code from the repo and build it here. I could find the following links:

For the libxml2, I made this file:

LIBXML2_SRC := https://gitlab.gnome.org/GNOME/libxml2/-/archive/v2.12.6/libxml2-v2.12.6.tar.gz

LIBXML2_CONFIG_ARGS = --enable-static --disable-shared\
    --prefix=$(COSMOS) --sysconfdir=/zip/etc --datarootdir=/zip/usr/share\
    CFLAGS="-Os"

$(eval $(call DOWNLOAD_SOURCE,lib/libxml2,$(LIBXML2_SRC)))
$(eval $(call AUTOTOOLS_BUILD,lib/libxml2,$(LIBXML2_CONFIG_ARGS),$(LIBXML2_CONFIG_ARGS)))

o/lib/libxml2/built.fat: FATTEN_COMMAND = $(DUMMYLINK0)

I'm not sure how to start this, I imported the lib/libxml2/BUILD.mk in lib/BUILD.mk but I'm not sure why I can't build with make -j4 libxml2.

I can make a PR for all of these libraries, if it's acceptable.

ahgamut commented 6 months ago

Okay, let's add the libraries first, then we'll figure out how to add the Python extensions. At the start I'll mention that if any of the libraries require cffi then I'd recommend waiting, because we need figure out a better way to handle that first.

Super happy to hear you'd like to add more libraries to the repo! The file config/README.md might help get a general idea of what's going on. For now let's focus specifically on the BUILD.mk posted for libxml2:

# the source will be downloaded from here
LIBXML2_SRC := \
    https://gitlab.gnome.org/GNOME/libxml2/-/archive/v2.12.6/libxml2-v2.12.6.tar.gz

# this creates the Makefile code for downloading the source
$(eval $(call DOWNLOAD_SOURCE,lib/libxml2,$(LIBXML2_SRC)))

# for dependencies you would need to do something like
# LIBXML2_DEPS := lib/ncurses lib/readline
# $(eval $(call SPECIFY_DEPS,lib/libxml2,$(LIBXML2_DEPS)))

# these are the args passed to the configure script
# note the $$(COSMOS), that's a Makefile thing
LIBXML2_CONFIG_ARGS = --enable-static --disable-shared\
    --prefix=$$(COSMOS) --sysconfdir=/zip/etc --datarootdir=/zip/usr/share\
    CFLAGS="-Os"

# this creates the Makefile code for configuring x86_64 and aarch64
$(eval $(call AUTOTOOLS_BUILD,lib/libxml2,$(LIBXML2_CONFIG_ARGS),$(LIBXML2_CONFIG_ARGS)))

# this specifies the script to create fat binaries
# set as $(DUMMYLINK0) if there are no fat binaries to build
o/lib/libxml2/built.fat: FATTEN_COMMAND = $(DUMMYLINK0)

the above seems like it should work.

I'm not sure how to start this, I imported the lib/libxml2/BUILD.mk in lib/BUILD.mk but I'm not sure why I can't build with make -j4 libxml2.

You were so close :), the build should start if you try make o/lib/libxml2/built.fat. If you look at the BUILD.mk for datasette you see that make datasette is just a shortcut for make o/python/cpy311-datasette/built.fat.

That ought to get a build started for libxml2. You might need to patch a few source files, but it's better if you don't have to. We'll get to that if necessary. Let's look at the other C libraries you want to build:

LXML requires libxml2 and libxslt. Pillow requires libjpeg, zlib, libtiff, libwebp, maybe openjpeg, other optional ones...

zlib is already included in Cosmo, you can build it via make o/cosmo-repo/base/built.fat. The other libraries are pretty well-known, so I would expect that getting them to build should be ok (less than 10 lines worth of patching for each).

ahgamut commented 6 months ago

For Python extensions, the best thing we have now is building them statically via Modules/Setup, because Python's package building methods have a lot of variance, I haven't figured out a common method to build them with Cosmo yet.

Modules/Setup is kind of a shell-script/Makefile hybrid that CPython uses when building its standard library from source. You can see that I patched the linked script file to build most of the standard library statically, and a added a few extensions like yaml/_yaml and markupsafe/_speedups.

It is pretty easy to add extensions if all the C code is present in the source of the package. At present I think it is more likely that you will get a build of Pillow working before LXML, because Pillow has all the C source code as part of the package while LXML has .pxd files that are probably run through Cython. LXML is also likely possible, but it will be easier later. (@jart I remember we discussed how we could handle Cython -> C extension builds at some point)

After building all the necessary C libraries, to add Pillow you would have to add some lines to Modules/Setup like:

# what Modules/Setup part for Pillow would look like, for starters:
Pillow._imaging Pillow/src/_imaging.c -ljpeg-lib

I would recommend trying to build things like yaml and markupsafe first with before moving on to Pillow and LXML.

croqaz commented 6 months ago

Awesome stuff !!

I'm taking the libs in order. libxml2 and libxslt are super similar and they go together. If I make one work, the other will be copy & paste.

When I run make o/lib/libxml2/built.fat, I get this error:

bash: line 4: /root/PWD/o/lib/libxml2/libxml2*/configure: No such file or directory
make: *** [config/common.mk:91: o/lib/libxml2/configured.x86_64] Error 127

If I go into o/lib/libxml2/libxml2-v2.12.6 and run: bash autogen.sh --without-sax1 --without-python that works and the configure is created, but then I see another error:

configure: error: source directory already configured; run "make distclean" there first
make: *** [config/common.mk:91: o/lib/libxml2/configured.x86_64] Error 1

There must be a step before AUTOTOOLS_BUILD, to run that autogen.sh script.

ahgamut commented 6 months ago

Oh if it has an autogen.sh step, perhaps we can do something like with pcre:

o/lib/libxml2/setup: o/lib/libxml2/patched
    cd $(BASELOC)/o/lib/libxml2/libxml2* && ./autogen.sh
    touch $@

o/lib/libxml2/configured.x86_64: o/lib/libxml2/setup
o/lib/libxml2/configured.aarch64: o/lib/libxml2/setup
croqaz commented 6 months ago

Awesome! I think it worked. I can see output in o/lib/libxml2/build/ for aarch64 and x86_64. I build pcre and I can see the exact same folders.

The thing is when I run the autogen.sh script, it generates the default config, with Python, with docs and examples, I wish I could exclude them. I can see the message "I am going to run ./configure with no arguments - if you wish to pass any to it, please specify them on the autogen.sh command line"

If I change the BUILD.mk file to:

o/lib/libxml2/setup: o/lib/libxml2/patched
    cd $(BASELOC)/o/lib/libxml2/libxml2* && ./autogen.sh --without-sax1 --without-python
    touch $@

It will throw the same error from before:

configure: error: source directory already configured; run "make distclean" there first
make: *** [config/common.mk:91: o/lib/libxml2/configured.x86_64] Error 1

The later config args don't seem to do anything? But maybe they do, not even sure how to check. I also tried --with-sax1=no --with-python=no in different places.

I guess it's more like a tweak now, because the compilation did work.

croqaz commented 6 months ago

OK I'm wrong. It actually works like intended.

make o/lib/libxml2/built.fat &> make.log
...
Disabling the older SAX1 interface
checking for library containing dlopen... none required
checking for pthread.h... yes
checking for library containing pthread_join... none required
Disabling zlib compression support
Disabling lzma compression support
checking for libiconv... none required
Disabling ICU support
Disabling code coverage for GCC
configure: creating ./config.status
...
croqaz commented 6 months ago

Added libxml2 and libxslt. https://github.com/croqaz/superconfigure/commit/dadf8426eab3a60948c7b56531ec05604e06ba01

I managed to also compile libjpeg, libtiff and libwebp in exactly the same way without issues. I didn't try baking them into the Python executable yet, I want to try that next.

Thank you very much for all the support here and on Discord @ahgamut !

croqaz commented 6 months ago

Managed to add LZ4 and LZMA in Python: https://github.com/croqaz/superconfigure/commit/309013cdb82e80eb1880ce0599e792846607529f I imported both and tested a few compress & decompress functions and they seem to work.

>>> import lzma
>>> lzma.compress(b'asd')
b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x02\x00!\x01\x16\x00\x00\x00t/\xe5\xa3\x01\x00\x02asd\x00\x00\xe3q\xb4=i\xb2\x0fd\x00\x01\x1b\x03\x0b/\xb9\x10\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'
>>> lzma.decompress( lzma.compress(b'asd') )
b'asd'

>>> import lz4.frame
>>> lz4.frame.compress(b'asd')
b'\x04"M\x18h@\x03\x00\x00\x00\x00\x00\x00\x00\x87\x03\x00\x00\x80asd\x00\x00\x00\x00'
>>> lz4.frame.decompress( lz4.frame.compress(b'asd') )
b'asd'
>>> lz4.frame.COMPRESSIONLEVEL_MAX
16
croqaz commented 5 months ago

Brotli works finally!

>>> import brotli
>>> brotli.compress(b'asd')
b'\x0b\x01\x80asd\x03'
>>> brotli.decompress( brotli.compress(b'asd') )
b'asd'

Super happy to have a static-built-python with all of these things.

croqaz commented 5 months ago

I'm closing this issue because I kinda implemented all the stuff that is currently doable in Cosmopolitan Python. LXML is a faraway dream and a huge huge blocker for me, because a lot of important libraries depend on it, but I'll continue digging.