Open barak opened 3 years ago
What exactly is it complaining about? Different architectures will have different build options, with different sets of features - this is intentional.
In that case, according to dh_makeshlibs(1)
, there need to be multiple debian/libchibi-scheme0.symbols.ARCH
files corresponding to the various sets of enabled features. What a mess...
FILES
debian/package.symbols
debian/package.symbols.arch
These symbols files, if present, are passed to dpkg-gensymbols(1) to be processed and installed.
Use the arch specific names if you need to provide different symbols files for different architectures.
That's annoying. It says "if present," can we just omit them altogether? If not, how hard is it to just create separate arch files for the 32-bit archs?
I think it's mostly only the image utilities that may not be present, another alternative is to allow a flag (only set on Debian), to replace these with dummy noops, but that's an ugly hack.
I think the right approach is to imagine someone writing some other program that wants to link to this stuff, which might use those facilities if they're available. What would make their job easiest?
/usr/include/chibi/features.h
would be one strategy, where they #ifdef
to not call the missing routines if they're not available.But, consider the following scenario: the library is recompiled (without a version change) to support vs not support that stuff.
E_UNAVAILABLE
. Then a client program could be compiled to use that stuff, but at runtime would disable it, and then if the library gets recompiled everything would be okay even without recompiling the client.Just brainstorming...
The best thing for users is exactly what Chibi already does. There are documented macros in features.h which control both the declaration and definition of the optional symbols. Trying to use an unavailable symbol gives a compile-time error - much better than a runtime error.
But we're not talking about the best thing for users, we're trying to understand why Debian defines these symbol files, whether we have to define them, and if so what's the best way to make the Debian tools not complain about perfectly valid differences in architectures.
You don't have to define symbols files at all. Things will work without them, perhaps with more restrictive library version constraints between packages than strictly necessary. But @ilammy thought they were a good idea and generated some.
I hear you about features.h
. The scenario I was imagining was features being enabled/disabled on some particular architecture, without necessarily recompiling other programs. Like, a program foo
can make use of the image feature if available, and some new architecture has the library compiled with the image feature disabled, but then as support for the architecture improves the library gets compiled again with images turned on. Currently this would require recompilation of foo
, but there's really no way to expose that fact. Having foo
notice at runtime would avoid this problem. Obviously it has other expenses though. Like many things, it's a tradeoff.
The best thing for users is exactly what Chibi already does. There are documented macros in features.h which control both the declaration and definition of the optional symbols. Trying to use an unavailable symbol gives a compile-time error - much better than a runtime error.
When the external interface is not stable between releases of the Chibi package (assuming the so-version is not bumped), other installed packages that rely on Chibi's shared library may break. This won't cause any compile-time error but a late dynamic link-time error. But maybe I haven't understood the point.
--
Isn't symbol version and/or dynamic loading with dlsym/dlvsym a solution to @barak's suggestion?
This isn't a matter of stability between releases. The problem described by #719 is features not supported on certain architectures confusing some Debian packaging feature I'm not familiar with.
ABI compatibility is typically handled as you say by the so-version. Chibi has not yet ever removed any public bindings between releases. If it were to do so, I'd bump the so-version.
One could imagine in theory some complex system of runtime checks for features involving dlsym. This would be difficult to specify, implement, and use, and still never be perfect. Imagine the sort of breakage that would arise from changing the signature
sexp_load_image(const char* file, off_t offset, sexp_uint_t heap_free_size, sexp_uint_t heap_max_size)
to something like
sexp_load_image(const char* file, off_t offset, sexp heap_spec)
Trying to support both with runtime checks in a binary would be insane. Better to just abort the whole program immediately if you detect this difference.
The best way to detect this is at compile-time. The next best is at link-time (so-version). But Chibi does provide a runtime ABI check if you really want, and it's already used whenever we load a shared library: SEXP_ABI_IDENTIFIER.
I'd say that if things are going to be fragile, at least they should be simple. Making them complicated and also fragile seems silly. So if we can't make things robust using symbols files and runtime checks, or don't want to, then we should at least make them simple.
Meaning forget the symbol files. And in the unlikely even that per-architecture feature churn causes issues in porting programs that use the library, or requiring recompilation of other programs, or whatever ... well that's the price you pay for simplicity.
I believe those are separate issues here...
Removing public symbols from a shared library without changing soname is... not nice. @ashinn does not do this, and it's great. And it's not what's happening here.
Actual binary compatibility is taken care of by software maintainers – not sonames, symbol files, etc. The reason why so-versions exist is to make sure that potentially broken software will not load if the maintainer missed a potential ABI change, not to magically take care of the compatibility. If you see it change for one of your dependencies, you are expected to review the changes and recompile.
Another issue here is that Chibi presents different symbol sets on different architectures. As there is only one symbol file, it looks like the symbols were removed, while in fact the image symbols were never there for ARM in the first place. Well... my bad, given how portable Chibi generally is, I did not really expect it to export different symbols.
I believe the correct solution here would be to maintain separate symbol files per architecture, as noted by @barak, listing the symbols available for each architecture.
As for why the symbols files were added in the first place, that's because it's good hygiene and simplifies tracking ABI compatibility. On the odd chance that public symbols are removed, package maintainers will be aware of that. The symbols files are not really used by compiler toolchains. However, they are used by Debian tooling to automagically figure out which packages of what minimal versions need to be added to dependencies based on the libraries that the binaries are linked against and the symbols they in fact use.
I don't really see a massive user base of maintainers queued in this PR, waiting for a response, so it would be... acceptable compromise to just drop the symbol files for the time being, as there isn't much demand for them. However, I personally value doing the things right from the beginning, if Chibi is going to stay in Debian repositories. If maintainers do not care, then why not just ask users to use checkinstall
– it kinda works, after all.
@ilammy, I totally agree with everything you're saying. Indeed, including correct symbols files is "the right thing". The question we're facing is how to make sure the symbols files are correct and stay correct, across different architectures with different features enabled.
The root problem is that the set of symbols exported by the library changes depending on which features are enabled, and not just on the library version. Care is taken to ensure that this is monotonic in both features enabled and version, so symbols never drop out unexpectedly. Which is great. But nothing ensures that the enabled features on some architecture is monotonic with time.
So we can either ensure that features are monotonic on each architecture are monotonic with time. Or ... well, there doesn't really seem to be an alternative to that. But that's a bit brittle, because one can certainly imagine needing to drop a feature on some architecture, due to an arch-specific bug or something like that. It also makes it tricky to actually generate the per-architecture symbols files. Maybe something that runs in debian/rules
after features.h
has been generated, and grovels through it checking for features and generating the symbols file. Probably would want it to be a script in the sources proper, that debian/rules
just invokes. Then there would be files listing the per-feature symbols that it accesses. I dunno, the effort/benefit seems pretty high...
@barak, there is no problem regarding changing features in Debian. They don't change - we always use the default build, and the default build is stable.
If you want a custom Chibi build, then you don't want the Debian package to begin with.
As for features, the defaults might change over time too so I guess it's still a good idea to keep an eye on features.h
when reviewing the changes for the updated Chibi packaging. But as long as the packaging itself does not customize the feature set (or if it does – then keeps it stable), there should be no problem with stability of that, I think. If a breaking change must occur, then at the time we'd think how to deal with it, but for now I don't think it's necessary to overthink it.
As for generating the file, the way I did this was to build the package, see it fail, and fix up the initial file generated by dpkg-gensymbols
. The same tool can be used for automation. That will require running it in VMs for all architectures supported by Debian so that it's accurate. It's not a simplest thing, but that's the most scriptable approach that I can think of right now.
Alternatively, the symbols file syntax is flexible enough, so another option is to maintain the architecture differences manually. Like, now we know that image operations from gc_heap.h
are not present on armhf (probably because it's 32-bit?), so they can be left only for 64-bit archs:
(arch-bits=64)sexp_gc_heap_pack@Base 0.8.0
...
This makes it possible to keep a single symbols file, but I don't know whether it's more maintainable. Keeping files separate for architectures makes them more amenable to automatic generation in the future.
And finally, I'd like to reiterate that the symbols file by itself is just an indicator. If a breaking change is ever introduced, it is there regardless of whether the package has or has not an up-to-date symbols file. The users might be affected either way.
Okay. The autobuilder runs on all the various architectures generated the data you're discussing, so maybe we can yank it off there to get bootstrapped? https://buildd.debian.org/status/package.php?p=chibi-scheme
Hm... well, dunno. Maybe it's better play dumb and just restrict the architectures for now. From what I gather from the build logs, all 32-bit architectures are affected. Since SEXP_USE_IMAGE_LOADING is defined with SEXP_64_BIT requirement, how about really limiting the symbols to 64-bit archs? (This will need to change with 128-bit CPUs but it's a bit too early for that.)
Here's a patch that should fix that:
The build log also has an error for sparc64 which is different:
scheme bytevector:
2.2 General Operations: ......
6 out of 6 (100.0%) tests passed in 0.007916927337646484 seconds.
2.3 Operations on Bytes and Octets: ..............
14 out of 14 (100.0%) tests passed in 0.023489952087402344 seconds.
2.4 Operations on Integers of Arbitrary Size: ..............
14 out of 14 (100.0%) tests passed in 0.020093202590942383 seconds.
2.5 Operations on 16-Bit Integers: ....Bus error
make[2]: *** [Makefile:256: test-libs] Error 138
I'm not quite sure exactly what's wrong there, but "bus error" speaks "address alignment issue" to me. While x86 and modern ARMs are lax on this, other architectures are still strict with memory access alignment. IIRC, SPARC is one of them.
If the code tries to access a 16-bit word via an odd address – which it appears to do at times, accessing 16-bit values in a bytevector at unaligned odd offsets – then it's quite possible for that error to be triggered. The implementation seems to be breaking strict aliasing rules and just does some pointer arithmetic to compute the address, then casts it into int16_t
.
I guess, the most safe and portable approach here would probably be to manually read two bytes to avoid alignment issues, but I'm kinda struggling to invent a compact form for that... Alternatively, it's possible to tell the compiler that this int16_t*
has only 1-byte alignment, then it will generate appropriate code. However, that requires some non-standard attributes.
The default features would not change in such a way as to break ABI compatibility.
Regarding breaking strict aliasing, it's a bit tedious to implement that properly. I'd wait for a complaint from an actual SPARC user before worrying about it :)
The fix to #714 took care of some architectures, but some lingering problems remain. See https://buildd.debian.org/status/package.php?p=chibi-scheme for version 0.9.1-3. E.g., build error on arch armhf, which is a release architecture.
https://buildd.debian.org/status/fetch.php?pkg=chibi-scheme&arch=armhf&ver=0.9.1-3&stamp=1606341898&raw=0