ezrosent / frawk

an efficient awk-like language
Apache License 2.0
1.25k stars 36 forks source link

Can the executable size be made smaller #113

Open 9ao9ai9ar opened 3 weeks ago

9ao9ai9ar commented 3 weeks ago
$ ls -lHSh /usr/bin/{awk,grep,mawk,rg} ~/.local/bin/frawk
-rwxr-xr-x. 1 user user  43M Sep 24 09:19 /home/user/.local/bin/frawk
-rwxr-xr-x. 1 root root 4.3M May 23 08:00 /usr/bin/rg
-rwxr-xr-x. 1 root root 747K Jan 24  2024 /usr/bin/awk
-rwxr-xr-x. 1 root root 175K Jan 21  2024 /usr/bin/mawk
-rwxr-xr-x. 1 root root 167K Jan 24  2024 /usr/bin/grep

frawk, when fully stripped, is still an order of magnitude larger than ripgrep, a similar Rust CLI program that is already an order of magnitude larger than traditional UNIX CLI tools. The large size gives the impression that the program is bloated, even more so when my benchmark shows that it is slower than mawk by some margin.

ghuls commented 23 hours ago

Compiling without llvm probably makes it a lot smaller. My frawk binary is 9.4M stripped and 17M unstripped.

I rarely see mawk being much faster than frawk (except for https://github.com/ezrosent/frawk/issues/98). Although I guess it might be possible if you test on small files (as the overhead of compiling the awk script in case of frawk might be most of the runtime). Do you have some example scripts?

9ao9ai9ar commented 22 hours ago

How to build frawk without LLVM? Like this?

cargo +nightly install --path . --no-default-features

I've used frawk with all optimization levels and backends in my scripts in this repo (invitation sent), and with the exception of rg3.sh where it runs faster with frawk, but only on modern hardware (SSD instead of HDD), all versions are consistenly at least a second or two slower with frawk compared to mawk, so definitely a noticeable margin. (You could either define a function mawk in benchmark.sh that calls frawk to make an in-place override to compare the results, or make a copy of each script in solutions that uses frawk instead of mawk/awk.) I haven't tried the comparison with the LumbrasGigaBase dataset though, where the files are much fewer in number and larger in size.

ghuls commented 16 hours ago
# Without LLVM, but with other recommended defaults
$ cargo +nightly install --path . --no-default-features --features use_jemalloc,allow_avx2,unstable