LemonPancakes / rust-kb

Knowledge Base in Rust
https://lemonpancakes.github.io/rust-kb/
3 stars 0 forks source link

Serialization #3

Open brotatotes opened 6 years ago

brotatotes commented 6 years ago

No matter how you do it, parsing is likely to be slow. So to mitigate that, you should use serde to serialize parsed KBs. (I hope I haven't discussed this before.) While you're benchmarking, you might want to benchmark serde loading against pest and nom, and see if that's worth having.

ChristopherKober commented 6 years ago

I'm not sure if we can have both serialization and our hashing symbol representation. The following documentation says that serializing arc and rc data does not work as expected.

https://serde.rs/feature-flags.html#-features-rc

When you deserialize something with an arc/rc each pointer gets it's own arc/rc, so we can no longer compare strings based on their pointer value.

brotatotes commented 6 years ago

Good point. I feel like we can worry about serialization later, as I think inference and hashing are more important features. Even so, if we naively implemented serialization, it should technically work, since the string interning will compare the actual strings, if the pointer values do not match. Therefore we could probably do some post-processing when de-serializing to de-duplicate the contents of the SymbolTable.

tov commented 6 years ago

That's true—deserializing will lose the common pointers, unless you do something about it. I think that means you need to consult an intern table while deserializing. I'm not sure if that's possible to do with serde. But it's easy to deserialize and then run over the deserialized result and fix it up.

Jesse

On Sun, May 27, 2018 at 3:18 AM Eric Hao notifications@github.com wrote:

Good point. I feel like we can worry about serialization later, as I think inference https://github.com/LemonPancakes/rust-kb/issues/6 and hashing https://github.com/LemonPancakes/rust-kb/issues/4 are more important features. Even so, if we naively implemented serialization, it should technically work, since the string interning will compare the actual strings, if the pointer values do not match. Therefore we could probably do some post-processing when de-serializing to de-duplicate the contents of the SymbolTable.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LemonPancakes/rust-kb/issues/3#issuecomment-392314102, or mute the thread https://github.com/notifications/unsubscribe-auth/AAsthEUz31WoAMszc60ejFOb23owLr_0ks5t2mFLgaJpZM4UMswN .

-- Dr. Jesse A. Tov Assistant Professor of Instruction Electrical Engineering and Computer Science McCormick School of Engineering Northwestern University

http://users.eecs.northwestern.edu/~jesse/