emersion / mrsh

A minimal POSIX shell
MIT License
492 stars 35 forks source link

Performance significantly worse than dash #156

Open ghost opened 4 years ago

ghost commented 4 years ago

I don't know if this is a real issue or just because the project isn't really mature yet, but because afaik there is no irc server where I could talk about this, I wanted to create this issue: When comparing the run times of this test code in mrsh and in dash:

#!/bin/sh
echo "Test starting..."
if test -e "123"; then
    echo hi
else
    echo hello
fi
head -c 50 /dev/urandom
printf "\n"
cat << EOF
123
456
789
EOF

i=0
while [ $i -le 100 ]; do
    printf "%s hello world\n" $i
    i=$((i+1))
done

I get those time results: dash: dash test.sh 0.00s user 0.01s system 95% cpu 0.007 total mrsh: mrsh test.sh 0.15s user 0.06s system 102% cpu 0.205 total

That's a very significant amount of ~30 times the amount that dash takes to run the script, and when running it a few times, I got similar results. You can even humanly feel the script in mrsh to take some time while in dash, it basically completes instantly.

emersion commented 4 years ago

I bet this is because dash implements some commands like echo, test, printf as built-ins while mrsh forks and execs them.

ghost commented 4 years ago

You are right, when replacing them with absolute paths, the speed of dash is about the same as the mrsh speed. Are those as builtin planned or are they out-of-scope?

emersion commented 4 years ago

That's a good question. I'm not sure yet. It's a little bit annoying to ship implementations for standard POSIX utilities on mrsh, but fork/exec is really slow.

ddevault commented 4 years ago

The performance is comparible in more realistic scenarios - this one is pretty contrived. But, ultimately, I think that if your shell's performance is the bottleneck in your system, then you're probably doing something wrong anyway.

ghost commented 4 years ago

I agree that this one is pretty contrived, but testing and printf/echoing is something that is done pretty often in shell scripts. It may not be noticeable for shell scripts running from cronjobs or as services, but for interactive scripts, it could very well be.

ddevault commented 4 years ago

I'll believe it when I see it. Let's see if anyone complains about performance during typical use, rather than while running benchmarks.

ghost commented 4 years ago

It was not while running benchmarks when I noticed this, but rather when running scripts normally, the benchmark here is mostly for illustrative purposes.

ddevault commented 4 years ago

Can you share the specific scripts which had noticable performance options, so we can optimize for their bottlenecks rather than for the benchmark?

ghost commented 4 years ago

One script I had this issue with I don't want to share here, but the other one is this one, but it wasn't as significant there, just about ~2 times the time that dash takes.

ghost commented 4 years ago

Also, even for the system, the shell is pretty surely not the bottleneck on the system, but one of many. If every application thinks like that, it sums up and you get a pretty perceptible difference compared to when every application was better optimized.

ddevault commented 4 years ago

Also, even for the system, the shell is pretty surely not the bottleneck on the system, but one of many. If every application thinks like that, it sums up and you get a pretty perceptible difference compared to when every application was better optimized.

Not necessarily. Simplicity always beats performance for anything which isn't a bottleneck.

ghost commented 4 years ago

Well that makes sense. But I wouldn't call the shell "no bottleneck at all". I'd say it's not a big one, but as I said, multiple of those can make a pretty significant difference, while one doesn't look like it does.

ddevault commented 4 years ago

https://raw.githubusercontent.com/jonhoo/configs/master/bins/bin/cert

This is a bash script

ghost commented 4 years ago

Oh my bad, I forgot that I had edited that to use posix sh, that one is mine:

#!/bin/sh
issuers=$(echo \
        | openssl s_client -servername "$1" -connect "$1:443" -showcerts 2>/dev/null \
        | grep -E '^  *i:' \
        | sed 's/^ *i:/issuer= /')

echo "$issuers" | while read -r issuer; do
        cn=$(echo "$issuer" | sed 's/.*\(CN\|OU\) *= *//')
        echo "==> For issuer $cn"

        for f in /etc/ca-certificates/trust-source/blacklist/*.pem; do
                [ -f "$f" ] || continue;
                i=$(openssl x509 -noout -issuer < "$f" 2>/dev/null)
                fcn=$(echo "$i" | sed 's/.*\(CN\|OU\) *= *//')
                #echo "'$cn' vs '$fcn'"
                if [ "$cn" = "$fcn" ]; then
                        echo "   -> Blacklist match on $(basename "${f%.pem}")"
                        echo "  $issuer"
                        echo "  $i"
                        break
                fi
        done
done
concatime commented 4 years ago

How about using vfork when possible?

emersion commented 4 years ago

POSIX.1-2008 removes the specification of vfork().

concatime commented 4 years ago

Didn’t know! At least posix_spawn is POSIX.

CONFORMING TO POSIX.1-2001, POSIX.1-2008.

emersion commented 4 years ago

I seriously doubt this will improve performance though. Interested in benchmarks in any case.

ddevault commented 4 years ago

posix_spawn is also a really bad API and I would rather not keep it around