anza-xyz / move

Move compiler targeting llvm supported backends
https://discord.gg/wFgfjG9J
Apache License 2.0
107 stars 32 forks source link

[Bug] Force move symbols to be less than 64 chars #378

Closed brson closed 9 months ago

brson commented 10 months ago

The compiler creates relocatable symbol names that are greater than 64 bytes, which rbpf doesn't support, and this is starting to impede progress (https://github.com/solana-labs/move/issues/303, https://github.com/solana-labs/move/pull/374/files#r1330706301).

Without a better solution, we could forcibly truncate any symbol names that exceed 64 bytes, probably appending a hash to the end to maintain uniqueness.

dmakarov commented 10 months ago

Appending hash to the end seems reasonable. Maybe we could have some fixed schema for symbols, e.g. first 16 bytes are lower 8 bytes of a module address as a hex string, next 16 bytes are 8 bytes of module name plus 8 byte hash of entire module name, final 32 bytes are first 16 bytes of function name and 16 byte hash of entire function name, or some other partitioning if it's more practical.

brson commented 10 months ago

Appending hash to the end seems reasonable. Maybe we could have some fixed schema for symbols, e.g. first 16 bytes are lower 8 bytes of a module address as a hex string, next 16 bytes are 8 bytes of module name plus 8 byte hash of entire module name, final 32 bytes are first 16 bytes of function name and 16 byte hash of entire function name, or some other partitioning if it's more practical.

Seems ok to me, though I haven't thought about this proposed scheme in detail.

ksolana commented 10 months ago

is it possible to modify rbpf into accepting longer names?

we can also have an accompanying symbol table with the real long name(for debugging purposes etc.) and only a hash representing the symbol in the original binary.

dmakarov commented 10 months ago

Yes, it’s possible, but the imposed limit is for a security issue.

ksolana commented 10 months ago

Yes, it’s possible, but the imposed limit is for a security issue.

ah never mind then.

we can also have an accompanying symbol table with the real long name(for debugging purposes etc.) and only a hash representing the symbol in the original binary.

we'll need a symbol table regardless of which hashing scheme we chose, right?

dmakarov commented 10 months ago

For debugging we would need to have the original symbols, but we can also lift the restriction on symbol length in the VM if it's used for debugging. When compiling for debugging we could use the original symbols, and when compiling for deployment we could use hashes. For now it's probably ok to just use hashes (assuming we can avoid conflicts) so we're not impeded by the length limit.