Closed brson closed 11 months ago
This looks good to me. It seems that RBPF entrypoint tests need to be updated to pass correct entrypoint function name in instruction_data.
This looks good to me. It seems that RBPF entrypoint tests need to be updated to pass correct entrypoint function name in instruction_data.
This patch doesn't change the names of the entry point functions since I wasn't sure whether or how I should change them, so as of now I don't think any of the tests need to change, unless I'm misunderstanding.
I've addressed the reviews and fixed the expected IRs to pass CI.
This fixes two problems with symbol naming.
Name collisions
Two functions with the same module and function name, but different addresses, will have the same symbol name, and generate a link error. Example
Long symbols
rbpf supports symbol names up to 64 characters (63 + a nil byte). Our current symbol naming will easily generate symbol names that are too long.
This patch uses an encoding scheme that guarantees short and unique symbols. That scheme is described fully in the comments.
Here is an example of a symbol generated by this scheme:
It is different from the one suggested in https://github.com/solana-labs/move/issues/378#issuecomment-1728397377 for a few reasons:
I also allocated 15 bytes to each of the module name and the function name. I think it is arguable that the module name is less important and often short compared to the function name, and those bytes could be reduced to add bytes elsewhere. e.g. the hash here is significantly truncated, so bytes could be added to it, but I also don't feel strongly that more bytes elsewhere will meaningfully improve this scheme.
Encoding the address and hash into all symbols ensures that all symbols are fairly long, so there could be concern about binary size. The only symbols names that will appear in the final binary though are ones that need to be relocated, which currently seems to include only public functions. If the compilation model changes in the future, e.g. using LTO to combine compilation units (or just compiling all modules as one compilation unit to begin with), we could possibly avoid those relocations, but I am not sure.
All the work needed to generate symbols here is arguably inefficient and could be cached for later lookups, but with our small workloads I am not thinking it matters. In casual testing rbf-tests takes approximately the same time to execute after this patch.
This uses blake3 for the hash because it is strong and fast, and base58 to encode the hash to valid symbol names.
It leaves alone the naming of the entrypoint symbols as I don't understand that code enough to know if and how it should change.
Fixes https://github.com/solana-labs/move/issues/303 Fixes https://github.com/solana-labs/move/issues/378