bytecodealliance / wasm-tools

CLI and Rust libraries for low-level manipulation of WebAssembly modules
Apache License 2.0
1.29k stars 230 forks source link

`wasmparser`: make `ahash` an optional dependency #1528

Closed Shnatsel closed 4 months ago

Shnatsel commented 4 months ago

The ahash dependency of wasmparser crate pulls in a considerable amount of dependencies. Here's an excerpt from cargo tree to illustrate it:

    │       ├── ahash v0.8.11
    │       │   ├── cfg-if v1.0.0
    │       │   ├── once_cell v1.19.0
    │       │   └── zerocopy v0.7.32
    │       │   [build-dependencies]
    │       │   └── version_check v0.9.4

This adds binary size overhead and introduces more unsafe code into the build.

I'd like to have the option to remove ahash and its dependencies from the dependency tree for use in high-assurance environments.

Shnatsel commented 4 months ago

It seems that ahash is only used when the std feature is disabled: https://github.com/bytecodealliance/wasm-tools/blob/df542a839b1fc6c1345c68abf5da21ca9ef5f9fe/crates/wasmparser/src/map.rs#L112-L126

Something also enables the ahash feature in hashbrown in the final Cargo.toml uploaded to crates.io, although I'm not sure what does that exactly:

[dependencies.hashbrown]
version = "0.14.3"
features = ["ahash"]
default-features = false

You can inspect the Cargo.toml uploaded to crates.io like this:

cargo install cargo-dl
cargo dl -x wasmparser
cat wasmparser/Cargo.toml
alexcrichton commented 4 months ago

Thanks for the report! Out of curiosity, have you noticed a measurable size increase with this support? Or otherwise are you mostly worried about the new deps?

Ideally this would be a no_std-only dependency but unfortunately Cargo doesn't have great support for that I think (or at least not that I know of). Otherwise the next-best-thing would probably be to work with ahash upstream to slim the set of dependencies. The ahash dependency isn't just used by wasmparser directly but also through hashbrown where the dependency is unconditional too.

Shnatsel commented 4 months ago

Thank you for the quick response!

Out of curiosity, have you noticed a measurable size increase with this support?

My use case is very basic, and it seems ahash and all associated code just gets removed by dead code elimination. So now that I've measured it, the binary size overhead doesn't seem to be significant (when using only a small subset of wasmparser, anyway).

Or otherwise are you mostly worried about the new deps?

Yes. I want to make my code very easy to adopt for organizations, which often involves a security review of all dependencies. Until I added wasmparser, the dependency tree was very small and 100% safe Rust all the way down (except for std). I even rolled my own parser for ELF/PE/Mach-O so that I could guarantee binary parsing with no unsafe code, heap allocations or panics.

Now that wasmparser added large crates with unsafe code to the tree, adopting my project will be a lot more expensive.

The ahash dependency isn't just used by wasmparser directly but also through hashbrown where the dependency is unconditional too.

ahash is an optional dependency for Hashbrown, see https://github.com/rust-lang/hashbrown/blob/835936584e33480f6afbf1ae9be7151c1b27d595/Cargo.toml#L17

unfortunately Cargo doesn't have great support for that I think (or at least not that I know of)

Indeed: https://github.com/rust-lang/cargo/issues/1839

I think our best bet is to have std and no_std features and require the user to choose one. That would be a semver-breaking change, but given the v0.206.0 version number it doesn't look like frequent semver breaks are a concern.

alexcrichton commented 4 months ago

Ok thanks for confirming on the binary size piece, that confirms my gut feeling here too. Also definitely understood about the dependency tree, that's a concern for us in Wasmtime as well!

Having to choose between std/no_std won't really compose well I think (e.g. Cargo's feature support doesn't support mutally exclusive features), but let me perhaps ask another question for this. Does your use case involve validating a WebAssembly module or just parsing it? If it's just parsing then I'm pretty sure we could easily put all the hash-map-bits and such behind a new "validate" feature or similar. That means you'd still be able to parse but wasmparser wouldn't be able to validate though.

If that'd work for your use case then that might be the best way forward?

Shnatsel commented 4 months ago

Yes, that would be excellent! I do not need validation at all, I just need to parse the high-level structure and extract specific sections. Being able to just disable the validation feature with its hashmaps and other heap allocations would be perfect.

alexcrichton commented 4 months ago

Ok sounds good, I'll work on splitting that out in the near future. And also, yes, we don't currently have semver-compatible releases, so this'll be a 207.0.0 version.

Shnatsel commented 4 months ago

Thanks a lot!

And yes, semver is a non-issue. I have anticipated that and didn't expose wasmparser types publicly in my libraries.

Shnatsel commented 4 months ago

v0.207 with this change has been released, and I've upgraded my tools to it: https://github.com/rust-secure-code/cargo-auditable/pull/150

It works great for my use case. Thanks a lot!