Closed miqm closed 3 years ago
I did a little backward digging, and I can see performance issue on 0.4.228 but was ok on 0.4.227, so the root cause is somewhere in the feeb313 commit
Yet another observation - disabling linter in bicepconfig.json suppressed performance issue. But I lost linter.
Thanks for pointing out the commit which introduced the problem. I did a few benchmarks with the new & old code, and can see that the new code is definitely slower, by ~3-4x. It seems surprising that this causes such a drastic performance issue however. I'll continue to investigate.
old code - running over 10k random strings (each 200 chars): 156ms, 153ms, 159ms new code - running over 10k random strings (each 200 chars): 634ms, 646ms, 548ms
Could you also benchmark with linter disabled? I feel since introducing it in 0.4 vscode performance dropped dramatically. Perhaps we should move it to some separate thread or run it few seconds after last keystroke?
Could you also benchmark with linter disabled? I feel since introducing it in 0.4 vscode performance dropped dramatically. Perhaps we should move it to some separate thread or run it few seconds after last keystroke?
Sorry should have been more specific in my comment - I'm just benchmarking the particular linting function here for the comparison. At the moment I'm just looking for an optimization we can make to bring us back in-line with the previous performance.
Long-term:
Yet another observation - disabling linter in bicepconfig.json suppressed performance issue. But I lost linter.
@miqm Does leaving the linter on but disabling just the no-hardcoded-env-urls
rule help?
- something unique to your scenario
I feel my scenario might be quite common as bicep advances in popularity and especially when introducing remote modules. We probably need to think about some caching mechanisms to not build entire syntax tree on each keystroke. Even in visual studio when you work on c# project changes in files made outside editor are reflected only after compilation. Perhaps we need to think of something similar.
Yet another observation - disabling linter in bicepconfig.json suppressed performance issue. But I lost linter.
@miqm Does leaving the linter on but disabling just the
no-hardcoded-env-urls
rule help?
I didn't know I can do that :) will try on Monday.
I didn't know I can do that :) will try on Monday.
Yeah, you can just set the level to off
.
- something unique to your scenario
I feel my scenario might be quite common as bicep advances in popularity and especially when introducing remote modules.
Yeah, we definitely need to fix it.
We probably need to think about some caching mechanisms to not build entire syntax tree on each keystroke.
We need to figure out where the bottleneck is exactly, though. We do reuse unmodified syntax trees already. (On TypeScript side, they didn't need to do incremental parsing until files over 40k lines.)
Even in visual studio when you work on c# project changes in files made outside editor are reflected only after compilation. Perhaps we need to think of something similar.
Yeah, they have an approach where they batch up changes over 3 seconds and then trigger a compilation asynchronously. Parsing itself will have to happen as changes come in because we are using semantic highlights, but potentially other stages of the compiler could be deferred.
Did some benchmarking of my own using a modified version of @anthony-c-martin's sample. Collected 3 samples of avg runtime produced by the logic below for each scenario:
no-hardcoded-env-urls rule scenario |
Sample 1 | Sample 2 | Sample 3 |
---|---|---|---|
Unmodified rule | 1964 ms | 2073 ms | 1913 ms |
No-op Configure() and AnalyzeInternal() |
1020 ms | 995 ms | 972 ms |
No-op Configure() |
1009 ms | 991 ms | 1010 ms |
It appears the AnalyzerInternal()
method of the no-hardcoded-env-urls
rule is responsible for about HALF of the overall execution time.
[TestMethod]
// https://github.com/azure/bicep/issues/3636
public void Test_Issue3636()
{
var stringLength = 200;
var lineCount = 10000;
Random random = new Random();
string RandomString(int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
#pragma warning disable CA5394 // Do not use insecure randomness
.Select(s => s[random.Next(s.Length)]).ToArray());
#pragma warning restore CA5394 // Do not use insecure randomness
}
var file = "";
for (var i = 0; i < lineCount; i++)
{
file += $"output test{i} string = '{RandomString(stringLength)}'\n";
}
// just warm up - throw away the results
CompilationHelper.Compile(file).Should().NotHaveAnyDiagnostics();
const int samplesCount = 5;
var samples = new List<long>(samplesCount);
for(int i = 0; i < samplesCount; i++)
{
var sw = Stopwatch.StartNew();
CompilationHelper.Compile(file).Should().NotHaveAnyDiagnostics();
sw.Stop();
samples.Add(sw.ElapsedMilliseconds);
this.TestContext.WriteLine($"Sample {i} = {sw.ElapsedMilliseconds} ms");
}
var avg = samples.Average();
this.TestContext.WriteLine($"Avg = {avg} ms");
}
cc @jfleisher
@miqm can you try out the build from main
to see if the issue has been addressed?
@miqm can you try out the build from
main
to see if the issue has been addressed?
I tried on my project and perhaps a slightly better, but still I need to wait for 2-3 seconds for completions to appear and there's visible delay when typing symbols.
All in all the fix didn't solve the problem.
@majastrz are we going to reopen this issue or open another one?
Let's use this one since it's not fixed.
any update on this?
I downloaded the JSON template from the correlationId you gave me. Unfortunately, I can't take it out of our secure environment and Bicep doesn't work there due to an unrelated issue (so I can't just decompile it). The next best bet is to replicate the module reference patterns that I see in the template to see if I can repro the problem.
Shared some info offline on capturing perf traces on OSX:
Steps should be as follows:
- Install dotnet trace (https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace)
- Reload your VSCode window ("Developer: Reload Window" command in VSCode)
- Open the .bicep file you use to repro the problem, and wait for Bicep language service to start
- Locate the language server process by running
ps -ef | grep bicepLanguageServer/Bicep.LangServer.dll
- note down the process Id in the second column of the output. On my Mac, it looks like this:501 850 641 0 7:28PM ?? 0:18.81 /Users/ant/Library/Application Support/Code/User/globalStorage/ms-dotnettools.vscode-dotnet-runtime/.dotnet/5.0.8/dotnet /Users/ant/.vscode/extensions/ms-azuretools.vscode-bicep-0.4.451/bicepLanguageServer/Bicep.LangServer.dll
- Run
dotnet trace collect -p <processId> -o bicep_0.4.451_perf.nettrace
to start the tracing - where<processId>
is the process Id from the previous step- Run through the steps to repro the issue
- Wait for any delayed diagnostics to catch up
- Stop the tracing by pressing Ctrl+C
It would be really useful to also see a trace following the same actions with the previous good version of Bicep (0.4.63) for comparison.
@kopendrex could you or Norbert try to gather diagnostics?
I managed to repro this myself on my own branch by creating SUPERMODULE.bicep: https://github.com/Azure/bicep/commit/1276ff7e69f95517feff1f7a5a23853fb8feb1cf
I did some profiling on various versions of Bicep, and could see that between v0.4.63 & v0.4.451 there was a LOT of CPU time spent in System.Text.RegularExpressions.MatchCollection.get_Count()
, which occurs inside the no-hardcoded-env-urls
linter analyzer which @miqm already pinpointed as the cause:
Here I've tested replacing the regular expressions with basic string search, and the results are looking pretty promising - speed is more in line with that of the v0.4.63 build: https://github.com/Azure/bicep/actions/runs/1072176564 - built from branch https://github.com/Azure/bicep/commit/1276ff7e69f95517feff1f7a5a23853fb8feb1cf.
I haven't yet replicated the full functionality as this was just a prototype, but I don't anticipate the additional functionality changing the performance characteristics.
v0.4.63 (previous good build):
v0.4.451 (bad build):
0.4.507-g1276ff7e69 (my custom build):
I was wondering why regex has such bad performance and I came across this article: https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices
Note this:
Because this regular expression was developed solely by considering the format of input to be matched, it fails to take account of input that does not match the pattern. This, in turn, can allow unconstrained input that nearly matches the regular expression pattern to significantly degrade performance
Bottom line: use regex when you expect to have match most of the time. Here we expect not to have it, so regex will perform poorly.
Bottom line: use regex when you expect to have match most of the time. Here we expect not to have it, so regex will perform poorly.
This felt counter-intuitive to me, so I did some more experimentation: https://dotnetfiddle.net/oQ63UU.
I think the article you linked above is specifically talking about backtracking potentially introducing performance issues, not regular expressions in general. It appears that our use of look-behind ((?<=\.|'|{|}|\s|^|/)
) is the culprit here, which I suspect is having a similar unexpected impact as backtracking.
If I had to guess, the current regular expression is first testing every single character in the string for the look-behind (which has quite a few different match conditions, and will turn up quite a few false-positives), before moving on to performing the actual string match, whereas the simplified regex and the basic string search can dive straight into the string match.
Bicep version 0.4.412
Describe the bug hints and reacting to changes in bicep extension are very slow when project has multiple nested modules.
To Reproduce Steps to reproduce the behaviour: Make a bicep project with lots of modules, observe VSCode extension performance.
Additional context Similar behaviour was in #3045