ashinn / chibi-scheme

Official chibi-scheme repository
Other
1.21k stars 141 forks source link

Symbol File Hassles Persist #719

Open barak opened 3 years ago

barak commented 3 years ago

The fix to #714 took care of some architectures, but some lingering problems remain. See https://buildd.debian.org/status/package.php?p=chibi-scheme for version 0.9.1-3. E.g., build error on arch armhf, which is a release architecture.

https://buildd.debian.org/status/fetch.php?pkg=chibi-scheme&arch=armhf&ver=0.9.1-3&stamp=1606341898&raw=0

...
   dh_makeshlibs -a -a
dpkg-gensymbols: warning: some new symbols appeared in the symbols file: see diff output below
dpkg-gensymbols: error: some symbols or patterns disappeared in the symbols file: see diff output below
dpkg-gensymbols: warning: debian/libchibi-scheme0/DEBIAN/symbols doesn't match completely debian/libchibi-scheme0.symbols
--- debian/libchibi-scheme0.symbols (libchibi-scheme0_0.9.1-3_armhf)
+++ dpkg-gensymbols7i_BiI   2020-11-25 22:04:55.257927165 +0000
@@ -38,6 +38,8 @@
  sexp_bignum_sub@Base 0.8.0
  sexp_bignum_sub_digits@Base 0.8.0
  sexp_bignum_to_double@Base 0.8.0
+ sexp_bignum_to_sint@Base 0.9.1-3
+ sexp_bignum_to_uint@Base 0.9.1-3
  sexp_bignum_zerop@Base 0.8.0
  sexp_bootstrap_context@Base 0.8.0
  sexp_buffered_flush@Base 0.8.0
@@ -71,6 +73,7 @@
  sexp_complex_sub@Base 0.8.0
  sexp_complex_tan@Base 0.8.0
  sexp_cons_op@Base 0.8.0
+ sexp_context_align_pos@Base 0.9.1-3
  sexp_copy_bignum@Base 0.8.0
  sexp_copy_list_op@Base 0.8.0
  sexp_cos@Base 0.8.0
@@ -123,8 +126,8 @@
  sexp_free_heap@Base 0.8.0
  sexp_free_vars@Base 0.8.0
  sexp_gc@Base 0.8.0
- sexp_gc_heap_pack@Base 0.8.0
- sexp_gc_heap_walk@Base 0.8.0
+#MISSING: 0.9.1-3# sexp_gc_heap_pack@Base 0.8.0
+#MISSING: 0.9.1-3# sexp_gc_heap_walk@Base 0.8.0
  sexp_gc_init@Base 0.8.0
  sexp_generate@Base 0.8.0
  sexp_generate_op@Base 0.8.0
@@ -150,8 +153,8 @@
  sexp_list_to_uvector_op@Base 0.9.0
  sexp_list_to_vector_op@Base 0.8.0
  sexp_listp_op@Base 0.8.0
- sexp_load_image@Base 0.8.0
- sexp_load_image_err@Base 0.8.0
+#MISSING: 0.9.1-3# sexp_load_image@Base 0.8.0
+#MISSING: 0.9.1-3# sexp_load_image_err@Base 0.8.0
  sexp_load_module_file@Base 0.8.0
  sexp_load_module_file_op@Base 0.8.0
  sexp_load_op@Base 0.8.0
@@ -264,7 +267,7 @@
  sexp_rest_unused_p@Base 0.8.0
  sexp_reverse_op@Base 0.8.0
  sexp_round@Base 0.8.0
- sexp_save_image@Base 0.8.0
+#MISSING: 0.9.1-3# sexp_save_image@Base 0.8.0
  sexp_scheme_init@Base 0.8.0
  sexp_set_current_environment@Base 0.8.0
  sexp_set_parameter@Base 0.8.0
dh_makeshlibs: error: failing due to earlier errors
ashinn commented 3 years ago

What exactly is it complaining about? Different architectures will have different build options, with different sets of features - this is intentional.

barak commented 3 years ago

In that case, according to dh_makeshlibs(1), there need to be multiple debian/libchibi-scheme0.symbols.ARCH files corresponding to the various sets of enabled features. What a mess...

FILES

       debian/package.symbols
       debian/package.symbols.arch
           These symbols files, if present, are passed to dpkg-gensymbols(1) to be processed and installed.
           Use the arch specific names if you need to provide different symbols files for different architectures.
ashinn commented 3 years ago

That's annoying. It says "if present," can we just omit them altogether? If not, how hard is it to just create separate arch files for the 32-bit archs?

I think it's mostly only the image utilities that may not be present, another alternative is to allow a flag (only set on Debian), to replace these with dummy noops, but that's an ugly hack.

barak commented 3 years ago

I think the right approach is to imagine someone writing some other program that wants to link to this stuff, which might use those facilities if they're available. What would make their job easiest?

But, consider the following scenario: the library is recompiled (without a version change) to support vs not support that stuff.

Just brainstorming...

ashinn commented 3 years ago

The best thing for users is exactly what Chibi already does. There are documented macros in features.h which control both the declaration and definition of the optional symbols. Trying to use an unavailable symbol gives a compile-time error - much better than a runtime error.

But we're not talking about the best thing for users, we're trying to understand why Debian defines these symbol files, whether we have to define them, and if so what's the best way to make the Debian tools not complain about perfectly valid differences in architectures.

barak commented 3 years ago

You don't have to define symbols files at all. Things will work without them, perhaps with more restrictive library version constraints between packages than strictly necessary. But @ilammy thought they were a good idea and generated some.

I hear you about features.h. The scenario I was imagining was features being enabled/disabled on some particular architecture, without necessarily recompiling other programs. Like, a program foo can make use of the image feature if available, and some new architecture has the library compiled with the image feature disabled, but then as support for the architecture improves the library gets compiled again with images turned on. Currently this would require recompilation of foo, but there's really no way to expose that fact. Having foo notice at runtime would avoid this problem. Obviously it has other expenses though. Like many things, it's a tradeoff.

mnieper commented 3 years ago

The best thing for users is exactly what Chibi already does. There are documented macros in features.h which control both the declaration and definition of the optional symbols. Trying to use an unavailable symbol gives a compile-time error - much better than a runtime error.

When the external interface is not stable between releases of the Chibi package (assuming the so-version is not bumped), other installed packages that rely on Chibi's shared library may break. This won't cause any compile-time error but a late dynamic link-time error. But maybe I haven't understood the point.

--

Isn't symbol version and/or dynamic loading with dlsym/dlvsym a solution to @barak's suggestion?

ashinn commented 3 years ago

This isn't a matter of stability between releases. The problem described by #719 is features not supported on certain architectures confusing some Debian packaging feature I'm not familiar with.

ABI compatibility is typically handled as you say by the so-version. Chibi has not yet ever removed any public bindings between releases. If it were to do so, I'd bump the so-version.

One could imagine in theory some complex system of runtime checks for features involving dlsym. This would be difficult to specify, implement, and use, and still never be perfect. Imagine the sort of breakage that would arise from changing the signature

sexp_load_image(const char* file, off_t offset, sexp_uint_t heap_free_size, sexp_uint_t heap_max_size)

to something like

sexp_load_image(const char* file, off_t offset, sexp heap_spec)

Trying to support both with runtime checks in a binary would be insane. Better to just abort the whole program immediately if you detect this difference.

The best way to detect this is at compile-time. The next best is at link-time (so-version). But Chibi does provide a runtime ABI check if you really want, and it's already used whenever we load a shared library: SEXP_ABI_IDENTIFIER.

barak commented 3 years ago

I'd say that if things are going to be fragile, at least they should be simple. Making them complicated and also fragile seems silly. So if we can't make things robust using symbols files and runtime checks, or don't want to, then we should at least make them simple.

Meaning forget the symbol files. And in the unlikely even that per-architecture feature churn causes issues in porting programs that use the library, or requiring recompilation of other programs, or whatever ... well that's the price you pay for simplicity.

ilammy commented 3 years ago

I believe those are separate issues here...

Removing public symbols from a shared library without changing soname is... not nice. @ashinn does not do this, and it's great. And it's not what's happening here.

Actual binary compatibility is taken care of by software maintainers – not sonames, symbol files, etc. The reason why so-versions exist is to make sure that potentially broken software will not load if the maintainer missed a potential ABI change, not to magically take care of the compatibility. If you see it change for one of your dependencies, you are expected to review the changes and recompile.

Another issue here is that Chibi presents different symbol sets on different architectures. As there is only one symbol file, it looks like the symbols were removed, while in fact the image symbols were never there for ARM in the first place. Well... my bad, given how portable Chibi generally is, I did not really expect it to export different symbols.

I believe the correct solution here would be to maintain separate symbol files per architecture, as noted by @barak, listing the symbols available for each architecture.

As for why the symbols files were added in the first place, that's because it's good hygiene and simplifies tracking ABI compatibility. On the odd chance that public symbols are removed, package maintainers will be aware of that. The symbols files are not really used by compiler toolchains. However, they are used by Debian tooling to automagically figure out which packages of what minimal versions need to be added to dependencies based on the libraries that the binaries are linked against and the symbols they in fact use.

I don't really see a massive user base of maintainers queued in this PR, waiting for a response, so it would be... acceptable compromise to just drop the symbol files for the time being, as there isn't much demand for them. However, I personally value doing the things right from the beginning, if Chibi is going to stay in Debian repositories. If maintainers do not care, then why not just ask users to use checkinstall – it kinda works, after all.

barak commented 3 years ago

@ilammy, I totally agree with everything you're saying. Indeed, including correct symbols files is "the right thing". The question we're facing is how to make sure the symbols files are correct and stay correct, across different architectures with different features enabled.

The root problem is that the set of symbols exported by the library changes depending on which features are enabled, and not just on the library version. Care is taken to ensure that this is monotonic in both features enabled and version, so symbols never drop out unexpectedly. Which is great. But nothing ensures that the enabled features on some architecture is monotonic with time.

So we can either ensure that features are monotonic on each architecture are monotonic with time. Or ... well, there doesn't really seem to be an alternative to that. But that's a bit brittle, because one can certainly imagine needing to drop a feature on some architecture, due to an arch-specific bug or something like that. It also makes it tricky to actually generate the per-architecture symbols files. Maybe something that runs in debian/rules after features.h has been generated, and grovels through it checking for features and generating the symbols file. Probably would want it to be a script in the sources proper, that debian/rules just invokes. Then there would be files listing the per-feature symbols that it accesses. I dunno, the effort/benefit seems pretty high...

ashinn commented 3 years ago

@barak, there is no problem regarding changing features in Debian. They don't change - we always use the default build, and the default build is stable.

If you want a custom Chibi build, then you don't want the Debian package to begin with.

ilammy commented 3 years ago

As for features, the defaults might change over time too so I guess it's still a good idea to keep an eye on features.h when reviewing the changes for the updated Chibi packaging. But as long as the packaging itself does not customize the feature set (or if it does – then keeps it stable), there should be no problem with stability of that, I think. If a breaking change must occur, then at the time we'd think how to deal with it, but for now I don't think it's necessary to overthink it.

As for generating the file, the way I did this was to build the package, see it fail, and fix up the initial file generated by dpkg-gensymbols. The same tool can be used for automation. That will require running it in VMs for all architectures supported by Debian so that it's accurate. It's not a simplest thing, but that's the most scriptable approach that I can think of right now.

Alternatively, the symbols file syntax is flexible enough, so another option is to maintain the architecture differences manually. Like, now we know that image operations from gc_heap.h are not present on armhf (probably because it's 32-bit?), so they can be left only for 64-bit archs:

 (arch-bits=64)sexp_gc_heap_pack@Base 0.8.0
 ...

This makes it possible to keep a single symbols file, but I don't know whether it's more maintainable. Keeping files separate for architectures makes them more amenable to automatic generation in the future.

And finally, I'd like to reiterate that the symbols file by itself is just an indicator. If a breaking change is ever introduced, it is there regardless of whether the package has or has not an up-to-date symbols file. The users might be affected either way.

barak commented 3 years ago

Okay. The autobuilder runs on all the various architectures generated the data you're discussing, so maybe we can yank it off there to get bootstrapped? https://buildd.debian.org/status/package.php?p=chibi-scheme

ilammy commented 3 years ago

Hm... well, dunno. Maybe it's better play dumb and just restrict the architectures for now. From what I gather from the build logs, all 32-bit architectures are affected. Since SEXP_USE_IMAGE_LOADING is defined with SEXP_64_BIT requirement, how about really limiting the symbols to 64-bit archs? (This will need to change with 128-bit CPUs but it's a bit too early for that.)

Here's a patch that should fix that:

```diff From 2ccea24ad53a0dc292998b6a643341df6ce580b3 Mon Sep 17 00:00:00 2001 From: Alexei Lozovsky Date: Sun, 29 Nov 2020 11:49:42 +0900 Subject: [PATCH] reconcile architecture differences in symbols list Symbols exported by are available only on 64-bit architectures, due to SEXP_USE_IMAGE_LOADING being disabled by default for 32-bit architectures (and below). Also, add some new symbols which first appear in Chibi 0.9.1. --- debian/libchibi-scheme0.symbols | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/debian/libchibi-scheme0.symbols b/debian/libchibi-scheme0.symbols index 325d28c3..fac1222b 100644 --- a/debian/libchibi-scheme0.symbols +++ b/debian/libchibi-scheme0.symbols @@ -39,6 +39,8 @@ libchibi-scheme.so.0 libchibi-scheme0 #MINVER# sexp_bignum_sub@Base 0.8.0 sexp_bignum_sub_digits@Base 0.8.0 sexp_bignum_to_double@Base 0.8.0 + sexp_bignum_to_sint@Base 0.9.1 + sexp_bignum_to_uint@Base 0.9.1 sexp_bignum_zerop@Base 0.8.0 sexp_bootstrap_context@Base 0.8.0 sexp_buffered_flush@Base 0.8.0 @@ -72,6 +74,7 @@ libchibi-scheme.so.0 libchibi-scheme0 #MINVER# sexp_complex_sub@Base 0.8.0 sexp_complex_tan@Base 0.8.0 sexp_cons_op@Base 0.8.0 + sexp_context_align_pos@Base 0.9.1 sexp_copy_bignum@Base 0.8.0 sexp_copy_list_op@Base 0.8.0 sexp_cos@Base 0.8.0 @@ -124,9 +127,9 @@ libchibi-scheme.so.0 libchibi-scheme0 #MINVER# sexp_free_heap@Base 0.8.0 sexp_free_vars@Base 0.8.0 sexp_gc@Base 0.8.0 - sexp_gc_heap_pack@Base 0.8.0 + (arch-bits=64)sexp_gc_heap_pack@Base 0.8.0 #sexp_gc_heap_stats_print@Base 0.8.0 # removed in 0.9.0 - sexp_gc_heap_walk@Base 0.8.0 + (arch-bits=64)sexp_gc_heap_walk@Base 0.8.0 sexp_gc_init@Base 0.8.0 sexp_generate@Base 0.8.0 sexp_generate_op@Base 0.8.0 @@ -152,8 +155,8 @@ libchibi-scheme.so.0 libchibi-scheme0 #MINVER# sexp_list_to_uvector_op@Base 0.9.0 sexp_list_to_vector_op@Base 0.8.0 sexp_listp_op@Base 0.8.0 - sexp_load_image@Base 0.8.0 - sexp_load_image_err@Base 0.8.0 + (arch-bits=64)sexp_load_image@Base 0.8.0 + (arch-bits=64)sexp_load_image_err@Base 0.8.0 sexp_load_module_file@Base 0.8.0 sexp_load_module_file_op@Base 0.8.0 sexp_load_op@Base 0.8.0 @@ -267,7 +270,7 @@ libchibi-scheme.so.0 libchibi-scheme0 #MINVER# sexp_rest_unused_p@Base 0.8.0 sexp_reverse_op@Base 0.8.0 sexp_round@Base 0.8.0 - sexp_save_image@Base 0.8.0 + (arch-bits=64)sexp_save_image@Base 0.8.0 sexp_scheme_init@Base 0.8.0 sexp_set_current_environment@Base 0.8.0 sexp_set_parameter@Base 0.8.0 -- 2.28.0 ```

The build log also has an error for sparc64 which is different:

    scheme bytevector: 
        2.2 General Operations: ......
        6 out of 6 (100.0%) tests passed in 0.007916927337646484 seconds.
        2.3 Operations on Bytes and Octets: ..............
        14 out of 14 (100.0%) tests passed in 0.023489952087402344 seconds.
        2.4 Operations on Integers of Arbitrary Size: ..............
        14 out of 14 (100.0%) tests passed in 0.020093202590942383 seconds.
        2.5 Operations on 16-Bit Integers: ....Bus error
make[2]: *** [Makefile:256: test-libs] Error 138

I'm not quite sure exactly what's wrong there, but "bus error" speaks "address alignment issue" to me. While x86 and modern ARMs are lax on this, other architectures are still strict with memory access alignment. IIRC, SPARC is one of them.

If the code tries to access a 16-bit word via an odd address – which it appears to do at times, accessing 16-bit values in a bytevector at unaligned odd offsets – then it's quite possible for that error to be triggered. The implementation seems to be breaking strict aliasing rules and just does some pointer arithmetic to compute the address, then casts it into int16_t.

I guess, the most safe and portable approach here would probably be to manually read two bytes to avoid alignment issues, but I'm kinda struggling to invent a compact form for that... Alternatively, it's possible to tell the compiler that this int16_t* has only 1-byte alignment, then it will generate appropriate code. However, that requires some non-standard attributes.

ashinn commented 3 years ago

The default features would not change in such a way as to break ABI compatibility.

ashinn commented 3 years ago

Regarding breaking strict aliasing, it's a bit tedious to implement that properly. I'd wait for a complaint from an actual SPARC user before worrying about it :)