ScottSyms / RustAISe

Apache License 2.0
17 stars 2 forks source link

use structsy and nom crates, non-string types in your structs, less string back and forth while number crunching #1

Open omac777 opened 2 years ago

omac777 commented 2 years ago

Hello there Mr. Syms,

I really appreciate your effort and work to parse all this data and your efforts to find a pipeline that parallelizes everything.

Your parallelizing is good, but I believe you could get better results if you used the rayon crate wherever possible.

Your parsing ability is good, but I believe you would be best served to preserve the original data in a type that best represents the data. Let me explain. Lots of parsing is going from some file line to a string, then converted to a specific numeral type like u32 or f32 or f64. Then later on you're converting that to a string again within your structures. All this back and forth from string type to XXX type is definitely hurting your performance goals.

In fact that nmea crate you are using is also doing the same. I can appreciate that it was all in the efforts of documenting what's getting parsed and being human readable, but the bottom line is always converting the data to string everywhere is killing your performance and the nmea is perhaps where you got the inspiration to convert to string all over the place. IMHO it doesn't serve your performance goals. IMHO rather than relying on the nmea crate to parse, I would recommend you use the nom crate and parse exactly what you need without converting it to a string, then placing it into the native u32/f32/f64 within your PositionReport structure that better serves your number-crunching goals.

After your number crunching, yes by all means display your types in human readable output, however along the way throughout your number crunching, don't display any human-readable stuff and as a last resort for debugging, you may, but take the debug log stuff out when doing a release build.

Lastly, you are using json for storing everything. Sure it is human readable, but it destroys performance as well. Please try something like structsy instead for storing/persisting your data for your tools. https://gist.github.com/omac777/dc8dbd9b4fe574e7e01ad6bf9a8ee9d7#file-tide_hello_structsy-rs-L10 My example isn't using any non-strings within the structure, but my intention here is your PositionReport structure could replace the MyUser structure and the result would be a better performance when reading/writing your data to a file while preserving it within specific non-string types that better represent your data.

Rewriting nmea to use nom, rewriting the other layers above that within your main would take some effort, but it wouldn't be that complex and the performance payback would definitely be worth it.

omac777 commented 2 years ago

I also noticed you haven't been using async file io apis. Try replacing all your open/close/read calls with their async/io_uring-based equivalents and you'll also see some io improvements there as well. https://github.com/async-rs/async-std/blob/c6b2128ccd0e836d112086b7e13741155ce6a17b/examples/print-file.rs

ScottSyms commented 2 years ago

This is awesome- thanks for the feedback!