cisco / ChezScheme

Chez Scheme
Apache License 2.0
6.97k stars 986 forks source link

on the tagging compiled files, policy and reliability in future versions #204

Open marcomaggi opened 7 years ago

marcomaggi commented 7 years ago

I do not know it this was already discussed somewhere. After compiling programs and libraries, inspecting the resulting files reveals that they are valid gzip compressed files; uncompressing them reveals that the resulting binary is prefixed with 4 zeros and the ASCII string chez. Compressing such a sequence of bytes generates the following binary string as header of compiled files (expressed as Bash string):

# This magic string is the result of compressing with gzip a sequence of
# four zeros followed by the ASCII string "chez".
#
PROGRAM_MAGIC='\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

What is the policy of this header generation (if there is one)? Can I rely on future Chez versions to keep such header or to generate a known header?

I want this to correctly configure: the Unix utility file to recognise binaries compiled by Chez; the Linux kernel module binfmt_misc to execute compiled programs without the specification of Chez's core program.

dybvig commented 7 years ago

We don't have any control over what zlib produces for those 8 bytes, and looking back at some older releases, I see different patterns. However, it should be possible to put our own, uncompressed prefix on a boot file and pass off the remainder to zlib for compression/decompression. We could guarantee that prefix doesn't change. I suppose it's too much to ask for file and especially the kernel module to decompress before looking at the bytes?

marcomaggi commented 7 years ago

I suppose it's too much to ask for file and especially the kernel module to decompress before looking at the bytes?

Unfortunately yes. No file processing is available in such mechanisms, at present. If you can add a reliable and readable prefix in front of the compressed data, it would be good; I inherited this feature in Vicare from the one in Ikarus and it worked just fine.

It should be all right to have different prefixes for different platforms: 32-bit and 64-bit; with and without multithreading; operating system specification (useful when cross compiling). A full ASCII config.guess style string should be fine if it is shorter than 100 bytes. I would go as far as to allow a custom suffix to be added to the generated prefix, as in generating the following prefix by default:

chez-x86_64-linux-nothreads-default

and allowing the following prefix to be generated with a Scheme-level parameter:

chez-x86_64-linux-nothreads-debug

For the record, I am using the following for the file utility on my Slackware:

0 string \x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03 Chez Scheme compiled file

and I am using this package for the module binfmt_misc.

mflatt commented 10 months ago

Time time ago, the compression of boot files and compiled files changed so that it's after the Chez Scheme header, "\x00\x00\x00\x00chez" (while compression switched from Zlib to LZ4 by default). There's still no guarantee, but this seems now less likely to change.