bytecodealliance / cargo-component

A Cargo subcommand for creating WebAssembly components based on the component model proposal.
Apache License 2.0
442 stars 49 forks source link

Build error: `error: string size out of bounds (at offset {offset})` #281

Open toadslop opened 3 months ago

toadslop commented 3 months ago

Recently I started experiencing an error when I run cargo comonent build. I'm sorry, but I can't figure out what's causing it or provide a minimal example. I'm hoping you all might have some idea what could be causing it.

I run this command:

cargo component build --package silex-parser

I get this output:

Finished dev [unoptimized + debuginfo] target(s) in 0.06s
error: string size out of bounds (at offset 0x19feacc9)

The interesting thing is, the build appears to have worked -- I get the output file and everything works as expected. The error code that gets generated by the command causes my CICD pipeline and buildscripts to abort, so it's rather annoying.

I've dug through my code for quite a while and couldn't find anything wrong with it. I basically commented out everything and then started adding things back in line by line and at a certain point, the error would start happening. This makes me think it could be related to the size of the project, but I don't know enough about how cargo-component works to guess at whether or not that could be the cause so I thought I'd raise the issue here and see if anyone more familiar with the project has any insights.

rylev commented 3 months ago

Without seeing the exact code that's being compiled, it's difficult to say for sure. The error you're seeing is caused by trying to read or skip string data from a WebAssembly module that exceeds the maximum allowed size for a string (100,000 bytes).

If you run wasm-tools validate $PATH_TO_COMPONENT do you see the error again?

toadslop commented 3 months ago

Hi Ryan, thanks for the idea. I ran the command that you mentioned, but nothing was output to the console at all. I assume that means that the checks were successful?

For some additional context, I'm writing a parser using a parser combinator library called chumsky. I believe that the problem starts happening when the parser types reach a certain depth of recursion, though I haven't been able to verify that yet.

toadslop commented 3 months ago

For additional context, I only run into errors when using cargo-component. If I compile to windows-msvc or directly to wasm32-wasi, I don't see any errors.

For more context, I found that when I run wasm-tools compose to compose my wasm components using the files that were output by the step that raised the error that I mentioned above, I get this error message:

error: failed to parse component `C:\Users\bnhei\source\silex\target/wasm32-wasi/debug/silex-parser.wasm`

Caused by:
    0: the file is not a WebAssembly component

It does seem that something was happening after the file was compiled that failed.

toadslop commented 3 months ago

wasm-chumsky-type-info.txt

I dug into the .wasm file and found that the type information generated for the Chumsky parser is massive -- 19.3 megabytes on its own (see attached file), well over the limit that you mentioned above. Judging from what you mentioned, I would guess that the type information is stored as a string, and thus goes over the maximum possible size for a string?

rylev commented 3 months ago

The error about the binary not being a component makes sense. It seems that cargo-component was successfully able to produce a WebAssembly module, but then when it tried to convert that module into a component, that step failed leaving you with just a plain module.

The type info does seem like a very likely culprit. The short term work around is to strip that info from the binary (perhaps using wasm-tools strip) and then making the module a component using wasm-tools component new. What the actual fix for cargo-component is, I'm less sure of. Seems like this error case might need to be handled more elegantly.

toadslop commented 3 months ago

Thanks for the suggestions. I tried using wasm-tools strip and it did significantly reduce the file size, but when I ran wasm-tools component new I received the same error message as before: error: string size out of bounds (at offset 0x3c4ff4). When I opened the binary to check its content, I did find that the massive Chumsky parser type was still included. I tried again with the --all option and got the following error:

error: failed to encode a component from module

Caused by:
    0: failed to decode world from module
    1: module was not valid
    2: module requires an import interface named `wasi_snapshot_preview1`

I guess in this case, it stripped too much. Next I'll --delete option and see if I can't target just that specific section.

toadslop commented 3 months ago

By the way, this issue a limitation of cargo-component, right? Is this something that I might expect to see a fix for in the near future, or is it niche enough that I should plan or relying on a workaround, or just to avoid using libraries that generate such large type definitions?

toadslop commented 3 months ago

To add one more update, it does seem that the offending part of the .wasm file was debug info -- when I tried building the project in release mode, I didn't encounter any error. This suggests that an easy fix would be to make cargo-component detect and skip debug info.

peterhuene commented 3 months ago

Hi @toadslop.

By the way, this issue a limitation of cargo-component, right?

No, this is an intentional limitation in wasmparser, the underlying library that various component model tools use to parse WebAssembly modules and components; it's a hard limitation that cargo-component cannot work around.

This suggests that an easy fix would be to make cargo-component detect and skip debug info.

wasmparser does not process debug information sections and cargo-component does not either, so I suspect the offending wasm string is actually contained the name custom section, which is processed by both wit-component (the library used to componentize the output from rustc) and cargo-component as part of appending producer metadata to the output.

Given that cargo-component provides error context when adding a producers section and that context is not present in the observed error message, my first guess would be it's failing from wit-component's attempt to add its producers section.

I think the proper fix here is to add a strip option in Cargo.toml for cargo-component to strip the custom sections of the core module prior to componentization; similarly, but a separate issue, an optimize option should be added for cargo-component for it to pass the module through wasm-opt.

peterhuene commented 3 months ago

I haven't tried, but the suppression of the name section might correlate to one of the profile options in Cargo.toml.

A possible work around might be adding the following to your Cargo.toml:

[profile.dev]
debug = false

or

[profile.dev]
strip = "symbols"

Obviously that will impact your ability to see useful stack traces for traps originating from the component (or otherwise being able to debug it).