dotnet / command-line-api

Command line parsing, invocation, and rendering of terminal output.
https://github.com/dotnet/command-line-api/wiki
MIT License
3.4k stars 381 forks source link

Lazy initialization of subcommands #1956

Closed KalleOlaviNiemitalo closed 1 year ago

KalleOlaviNiemitalo commented 1 year ago

Is there a way to save startup time by not initializing subcommands other than the one chosen by the user? The parser surely needs the name and aliases of every subcommand, but I hope the following could be skipped:

OTOH, I have not measured how much time these actually cost.

Might help with https://github.com/dotnet/command-line-api/issues/1008 as well.

jonsequitur commented 1 year ago

@adamsitnik could say more about whether this was a performance issue. I suspect it's not a bottleneck at this point.

That said, lazy initialization is used in dotnet new because the leaf nodes of the parser are added based on user-installed templates. The overall experience needed to be dynamic. @vlada-shubina could say more about this.

adamsitnik commented 1 year ago

Our startup scenario (https://github.com/adamsitnik/commandline-perf/tree/update) is mostly dominated by JITing and type loading (I've verified that with a profiler).

If we take a look at the following benchmark results we can see that:

Method Args Mean Error StdDev Median Ratio RatioSD
Empty --bool -s test 65.06 ms 0.409 ms 1.205 ms 64.84 ms 0.74 0.02
EmptyAOT --bool -s test 13.01 ms 0.139 ms 0.411 ms 12.94 ms 0.15 0.00
Corefxlab --bool -s test 84.97 ms 0.791 ms 2.333 ms 84.66 ms 0.97 0.03
SystemCommandLine2021 --bool -s test 117.71 ms 0.624 ms 1.841 ms 117.39 ms 1.34 0.02
SystemCommandLine2022 --bool -s test 99.46 ms 0.375 ms 1.107 ms 99.15 ms 1.14 0.02
SystemCommandLineNow --bool -s test 87.58 ms 0.332 ms 0.979 ms 87.46 ms 1.00 0.00
SystemCommandLineNowR2R --bool -s test 79.16 ms 0.341 ms 1.006 ms 79.17 ms 0.90 0.02
SystemCommandLineNowAOT --bool -s test 14.88 ms 0.170 ms 0.503 ms 14.85 ms 0.17 0.01

That is why I've been pushing so hard on removing all redundant types.

I don't expect us to gain a lot by making more things lazy. Our next perf goal is to include a precompiled version of S.CL in .NET SDK, so everybody would use a pre-compiled version out of the box.

Having said that, I am going to close the issue. Thank you for your feedback @KalleOlaviNiemitalo !

KalleOlaviNiemitalo commented 1 year ago

The benchmark apps don't have any subcommands, do they? So they don't show how much time would be spent constructing the subcommands and their options and arguments. The descriptions of options and arguments are not localised either so the benchmark does not include a resource lookup cost that could be skipped when a different subcommand is used, or when help is not requested.

adamsitnik commented 1 year ago

The benchmark apps don't have any subcommands, do they? So they don't show how much time would be spent constructing the subcommands and their options and arguments.

All the major allocations were already removed and those which remained are lazy (collections are allocated when needed, not up-front).

The benchmark apps don't have any subcommands, do they? So they don't show how much time would be spent constructing the subcommands and their options and arguments.

The resource lookup should be cheap.

@KalleOlaviNiemitalo I encourage you to take one of your real life apps and profile it and share the findings. I always happy to improve perf, I just don't believe that the parts that you have mentioned are major perf bottlenecks.