AlienKevin / wordshk-tools

Tools for processing open Cantonese dictionary data provided words.hk
https://docs.rs/wordshk_tools/latest/wordshk_tools/
18 stars 3 forks source link

Good stuff #1

Open winston0410 opened 2 years ago

winston0410 commented 2 years ago

Not really a issue, but really appreciate your work. I am being curious, why makes you build a new parser combinator instead of using some existing solution like nom?

AlienKevin commented 2 years ago

Thanks! I found libraries like nom to be complicated to work with initially so I decided to write my own parser combinators modeled on elm/parser. Looking back, my parser library is more like an experiment for learning Rust and combinators. I ran some benchmark tests on JSON parsing comparing nom and my library. Mine was like 50x slower even after both are compiled with optimization. Though for parsing words.hk (megabyte level data), the difference is not noticable. wordshk-tools can do a complete parse under several seconds if you compile with optimization. Though without optimization it takes around 10 seconds or more.

AlienKevin commented 2 years ago

Also my parser library support Unicode as a first class in both parsing and error messages. You can use graphemes as a unit of parsing, so emojis, Korean script, and other writting systems can be more accurately parsed.

winston0410 commented 2 years ago

Also my parser library support Unicode as a first class in both parsing and error messages. You can use graphemes as a unit of parsing, so emojis, Korean script, and other writting systems can be more accurately parsed.

Yes for nom I have to use a unicode crate for eating unicode characters