confio / decode_raw

A protobuf debugging tool or protoc --decode_raw on steroids.
Apache License 2.0
9 stars 2 forks source link

Wasm/Node.js #1

Closed konsumer closed 2 years ago

konsumer commented 2 years ago

Hi, I am the author of rawproto a library that does similar to what this does, in javascript, but outputs JSON. I was thinking I was going to try to make a better version of my lib in rust (and compile to wasm for node) and I came across this CLI tool that does an excellent job. I love the technique of "parse an Empty in protofish, and use the unknown fields to get raw info" method. Ingenious!

As an initial test, I took proto.rs, renamed it to lib.rs, and plopped it in a fresh project, and it builds fine (I appreciate how you keep all the formatting and CLI parsing in other files, very nicely organized) so I think it will work great with a little wasm-wrapping. Is there another preferred way I should do this (like should we break that file out into it's own rust lib, or is it ok for me to just plop the file into my thing?) How would you like to be credited in README, if I get this working?

webmaster128 commented 2 years ago

Hey David, welcome and thanks for the nice message.

I'm happy to collaborate, especially since we need to investigate how to work with protobuf JSON version as well. Not sure if that would happen here though because in raw proto you can't even reliably differentiate between a nested message, a string and raw bytes. How would you then do things like date formatting without schemas?

Anyways, what we can do for sure is to export parts of this project as a library. Then we can increase test coverage and overall stability of the lower level components. I'd not promote it as a library and not guarantee lots of stability, but we could work in that direction. So if you want, you can send a PR to make this a library and a binary. I have not done this before but it should work I guess.

If you want you can start a GitHUb action config that builds the library part for the wasm32-unknown-unknown target. Then you are sure no dependencies is pulled in that causes trouble with Wasm later on.

How would you like to be credited in README, if I get this working?

I think it would be nice if you direct people here who are looking for a debugging tool for human interaction. This is what this project is about. Probably many of your users need both, the programmable version with JSON and the manual version. Other than that, no requirements other than what the license requires.

I love the technique of "parse an Empty in protofish, and use the unknown fields to get raw info" method. Ingenious!

💐

konsumer commented 2 years ago

I'm happy to collaborate, especially since we need to investigate how to work with protobuf JSON version as well.

Sweet!

Here is my work so far. If you are interested in experimenting there, I can add you as a maintainer.

Not sure if that would happen here though because in raw proto you can't even reliably differentiate between a nested message, a string and raw bytes. How would you then do things like date formatting without schemas?

I have a general policy of just using the lowest parsable common-denominator, and some things can be guessed, but you can tell it to force a type (like string could be buffer or string or sub-message, so it checks for utf-bounds for "auto" but you can force it. Also, I use a try/catch and try to parse it as a message, and that can sort of detect sub-messages. I also generate a sort of dummy proto (which would probably be more robust in protofish, or at least using it to parse the proto SDL.) You can take that, edit it, make the field names/types more sensible (by manually analyzing the output from what you are reverse-engineering) and then run your binary back through the edited proto, and it will make more sense. It's not perfect, but it usually gets me pretty far, and allows building quick single-purpose javascript-based UI tools for playing around with the structure.

The awesome pbjs parser (and google's generated js code & parser, which is less awesome, but generally works ok) actually chokes on some test binary which led me down this wasm path) but using the JSON you can do stuff programmatically with it (which I know is not a goal for this project, but that was my goal.) It's also cool to use flat/jq/etc to find stuff. My rawproto is currently in kind of a flux state. I started documenting planned features in README and adding them to CLI, but didn't quite implement protoc's partial parsing/raw type parsing. That's when I noticed it wasn't getting all the fields in my test-data, so I started looking for another way.

I tried wrapping libprotobuf (from google) and a few different rust libs. I found that rs-protobuf, and rust in general was much nicer to work with around wasm than libprotopbuf/cpp, but it also missed the problem-fields, like the js parsers. Your CLI does fine with it, as does protoc (which uses libprotobuf.)

I have working basic wasm usage of your try_parse_entries function. This is dev-console:

Screen Shot 2022-02-10 at 3 07 54 AM

You can play with it here

Seems great so far! I imagine doing more (like building fancier JSON or whatever) could be done in js-space, with a nice big flat path: value (and then type-checks for int/bytes-array) as I am using, but I am not sure. Now that I am delighting in how easy it is to do rust wasm (after some initial learning pains) I kinda want to move as much as I can into rust-space, so I am open to doing it however seems sensible, but I'm much faster with js, so it's probly going to be my goto for prototyping ideas.

The next steps for me using this directly, I think, would be allowing a partial proto SDL (in decode_fields, instead of the Empty) so I can merge partial with raw (like protoc does) in the actual parsing stage. This might be useful for decode_raw, too, since often you know a part of the definition (made by hand, generated with rawproto, or whatever) and it's easier to read. Not sure what the putput would look like though, your fieldnum-indent is pretty rad for grepping, and it would be very hard to read with non-numbers in the keys.

Anyways, what we can do for sure is to export parts of this project as a library. Then we can increase test coverage and overall stability of the lower level components. I'd not promote it as a library and not guarantee lots of stability, but we could work in that direction. So if you want, you can send a PR to make this a library and a binary. I have not done this before but it should work I guess.

That sounds great! I'm kinda new to rust, so I'm not sure of best practices, but in other ecosystems, I usually pull out the library, publish that, then ref it in the consumers (so like my node-thing and decode_raw, the CLI.) Often keeping them all in a monorepo makes it easier to work with. Seems like in rust the CLI & lib can co-exist in a single crate, and it's fine, so maybe my usual isn't the best way. We could probly also include the wasm bindings in this, as well, so it's all one thing, or keep it totally separate, so you can use the lib without CLI/web, or CLI without web, etc. Maybe the library is already just protofish, though, since for the part I am using, it's basically just your clever empty proto + your path logicc + protofish. Before I found your tool, I was just thinking of a very similar jq-like feature in rawproto, but the language I imagined had array-indexes (I actually like your way better.)

If you want you can start a GitHUb action config that builds the library part for the wasm32-unknown-unknown target. Then you are sure no dependencies is pulled in that causes trouble with Wasm later on.

Sure! I might do a build/publish on my experiment project to get it ironed out, just because it's already setup to make a web-page and build the js wrapper & stuff (and has the CLI stuff stripped.) It is also all setup as just the lib, which might make things easier. I am using wasm-pack build instead of cargo directly, just because it's a bit simpler.

I think it would be nice if you direct people here who are looking for a debugging tool for human interaction. This is what this project is about. Probably many of your users need both, the programmable version with JSON and the manual version.

Sounds good! Totally agreed. I feel like it's a perfect companion to finding raw paths, and just doing any sort of protobuf reverse-engineering, in general, so definitely all the same audience.

konsumer commented 2 years ago

Update: I am building demo-site and wasm in GH action, and it works pretty well.

For this repo, I could imagine also setting these actions up:

Other ideas:

Screen Shot 2022-02-10 at 5 12 07 AM

Sidenote, I found what is maybe a bug around keeping the path-state between loads or something.

Screen Shot 2022-02-10 at 5 07 05 AM

If I add more 1's, it just expands the path (so it seems like it's not really grabbing the right thing.) It may have to do with how I am using it in js, or something in the wasm inter-comm.

Screen Shot 2022-02-10 at 5 13 46 AM
konsumer commented 2 years ago

If you want to play with the CLI on the web:

rustup target add wasm32-wasi
cargo build --target wasm32-wasi

then drag target/decode_raw.wasm into window and run it:

Screen Shot 2022-02-10 at 5 45 53 AM
webmaster128 commented 2 years ago

I'll need some time to catch up with all the points, but the deal is this: The whole project cannot be compied to wasm32-unknown-unknown because it needs environment interactions (read STDIN, write to STDOUT) which are not necessarily available to a Wasm sandbox. However, if you split a core library from the CLI, you can make sure this library compiles to wasm32-unknown-unknown.

I'm not sure if WASI is available in Node.js. Most likely it is not in browsers.

konsumer commented 2 years ago

I'm not sure if WASI is available in Node.js. Most likely it is not in browsers.

It is, in both (output of cargo build --target wasm32-wasi will run at https://webassembly.sh/ that is above terminal screenshot) wasi is pretty universal. I don't really want the wasi part, though, I just want the lib, as I am using it in my demo. I am just showing that you could allow people to try out decode_raw CLI, before installing, if they wanted to.

webmaster128 commented 2 years ago

Seems like you are adding the filter path to the keys for some reason. This problem does not appear in the CLI version.

Here you see the same binary in both tools: Bildschirmfoto 2022-02-10 um 22 48 55

konsumer commented 2 years ago

Yep, my point is it's in the way it works over wasm or my js. I double-checked my js and it seems good (it's not appending it or anything) so I will play around with it a bit more.

konsumer commented 2 years ago

That issue is weird. In rust space, I log both the [64] and str version (that I pass through your parse_select_query) and they look correct, but it seems to be outputting a . query, with all the keys prepended with query. It's 206 paths, instead of 7 (from my test data in screenshot.)

js_path: ".1.5.7.37.1.6"
rawproto.js:195 path: [1, 5, 7, 37, 1, 6]
Screen Shot 2022-02-10 at 11 47 34 PM

I noticed in your decode, you use hardcoded &[] then after do this to filter the path:

 if !entry.path.starts_with(&config.select) {
    continue;
}

So, I think the bug comes form me not understanding the path param. I should be filtering the output of try_parse_entries. I will pull stuff out of decode to make a simplified entries (that only shows selected path.)

konsumer commented 2 years ago

Yep, this worked:

#[wasm_bindgen]
pub fn parse_raw(bytes: &[u8], js_path: &str, js_config: &JsValue) -> JsValue {
    let config = js_config.into_serde().unwrap();
    let path = parse_select_query(js_path);
    let mut ret: Vec<Entry> = vec![];

    if let Some(entries) = try_parse_entries(bytes, &[], config) {
        for entry in entries {
            if !entry.path.starts_with(&path) {
                continue;
            }
            ret.push(entry);
        }
    }
    let js_ret = JsValue::from_serde(&ret).unwrap();
    js_ret
}
Screen Shot 2022-02-11 at 12 04 55 AM
webmaster128 commented 2 years ago

True, the path param in try_parse_entries is something different. Let me try to make this more clear.

In the meantime you can have a look at #2, which should make all parsing and filtering functionality available as a library (see lib.rs).

webmaster128 commented 2 years ago

In https://github.com/confio/decode_raw/pull/3 I removed the path argument from the public interface. This is only needed internally for the recursion part.

konsumer commented 2 years ago

Looks great. I'd call this issue "resolved". It might make sense to expose the printing stuff (as returned string) in the lib too (for other people who want to use it from rust/wasm/etc) but that is maybe another issue. I am doing it in js-side, and it works pretty nice, and I can write & export a formatter function.

Additionally, I realized that by totally flattening to object on display, I was clobbering entries (because 2 object-items can't have same key) so I made a format that looks kinda like yours, in js that does nice indenting and stuff.

Any thoughts on node-packaging? This does what I need for raw-parsing, so I will probably remove a lot of stuff from rawproto and use it. I can publish our thin wasm-wrapper to npm as another name, if you like, and use it as a dep in rawproto, or we can just use rawproto as the module. I can also publish to wapm.

webmaster128 commented 2 years ago

Additionally, I realized that by totally flattening to object on display, I was clobbering entries (because 2 object-items can't have same key)

I think this is a fundamental problem that is tricky to get right. Due to the way protobuf works, the paths that are created here are not unique. For a repeated top level field with field number 2, you have one entry with path .2 for each element. So this cannot correctly be represented in a map where keys need to be unique without adding an additional counter. decode_raw handles this by not trying to convert to a map and list multiple entries with the same path on multiple lines of the output.

konsumer commented 2 years ago

decode_raw handles this by not trying to convert to a map and list multiple entries with the same path on multiple lines of the output.

Yep, I did same, just outputting a string, in a loop.

webmaster128 commented 2 years ago

Seems like most of the original issue is covered. Let's follow up with more specific tickets.

webmaster128 commented 2 years ago

Any thoughts on node-packaging? This does what I need for raw-parsing, so I will probably remove a lot of stuff from rawproto and use it. I can publish our thin wasm-wrapper to npm as another name, if you like, and use it as a dep in rawproto, or we can just use rawproto as the module. I can also publish to wapm.

At this point I don't have any plans to support Wasm/npm/browser use cases. Feel free to explore those areas.

konsumer commented 2 years ago

Published here. Still need to do some testing. Eventually I may want to merge it with rawproto, but that will give it a place to live for now.