connorskees / grass

A Sass compiler written purely in Rust
https://docs.rs/grass/
MIT License
499 stars 38 forks source link

Add `from_path` and `from_string` variants that don't buffer output #86

Closed hsanzg closed 8 months ago

hsanzg commented 1 year ago

Makes the Serializer API take an instance of std::io::Write where the serialized contents are written, instead of placing them in an internal buffer that is then converted to a String.

Moves the ASCII-detection code to the visitor API, where we can check for non-ASCII characters only in the relevant places: strings, selectors and units. This also means that the finish method no longer performs a memory copy of the entire serialized output when a non-ASCII character is detected.

Other miscellaneous changes include removing some allocations in the visit_calculation, write_comment and visit_stmt (in particular, the code in charge of writing a KeyframesRuleSet) methods.

The current APIs create a buffer, pass it to a new serializer instance, and finally interpret the buffer's contents as a UTF-8 string. The new public compiler functions write_from_path and write_from_string work exactly in the same manner, but take an (ideally buffered) writer as their last argument.

This passes all tests run by executing cargo test on the main directory, but I'm not sure if I missed another test suite present in the repo. My computer is far too slow to run any benchmarks, but I suspect this changeset should speed up the serialization stage considerably.

connorskees commented 1 year ago

Thank you for this change and opening this PR. I'm realizing now that the serialization code could use some more work, and this is an interesting step towards fixing some of that up. I quite like the @keyframes change.

Do you have a particular use-case that relies on being able to write to an arbitrary dyn Write vs getting back a String? I assume this, alongside the is_ascii changes have been done for performance?

For the is_ascii change, my immediate assumption is that the final is_ascii check over the entire string is actually faster than doing many smaller checks, because the final is_ascii should get compiled down to a very fast SIMD loop over a large number of characters at once vs lots of function calls and tree walking in the visitor. Additionally, the is_ascii check (and subsequent memcpy in the case of non-ascii output) shouldn't be a large part of the total execution time. I would be surprised if it showed up in a profiler at all for any input, though I also don't think I've ever profiled Sass code that does have non-ascii output. At the very least, in the case of bootstrap and other large libraries, the serialization step barely shows up during profiling iirc.

For the dyn Write change, I'm wary of adding new functionality to the public API and some of the code complexity that comes with supporting this. Similar to the above, I'd be surprised if writing to an intermediate buffer String before writing to a file shows up in the profile at all for any input. If you have a particular use case or a profile that demonstrates the performance win, I'd be happy to add such functionality.

Even if your computer is slow, you might still be able to run benchmarks and profiles. For benchmarking, I usually use hyperfine to benchmark the CLI and often ad hoc usage of std::time::Instant to print out how long particular sections of code take to run.

If you find your computer is still too noisy to benchmark reliably, I do all of my benchmarking and profiling on dedicated virtual machines which I SSH into. The pricing is quite reasonable and they're very easy to set up and tear down.

For profiling I primarily use perf. I'm less familiar with good and free tools to use on Windows, but I think some people use DTrace. cargo flamegraph is also a great tool that builds on top of perf and can show you a different visualization from perf report.

connorskees commented 8 months ago

Hello,

Thank you again for your contribution. I am closing out this PR as it seems inactive, but please feel free to re-open at any time. If you're able to expand on your use case for accepting a dyn Writer, I'd be interested to hear more.