dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.22k stars 1.35k forks source link

Remove operation should not match against its globs #2329

Open cdmihai opened 7 years ago

cdmihai commented 7 years ago

Remove operations can reference values, globs, and other items. Its implementation is split in two parts

However, when the actual Remove operation executes, it still looks at its globs and does Regex matching, in addition to matching its potential values and referenced items. This is unnecessary, since the globs got propagated to past operations. Not matching the globs would also be a test for the lookahead optimization :)

ladipro commented 3 years ago

Building a project like this one:

<Project>
  <ItemGroup>
    <Test Include="somefile.vb" />
    <Test Include="**/*.cs" />
    <Test Remove="**/somefile.*" />
  </ItemGroup>

  <Target Name="Build">
    <Message Text="@(Test, ', ')" />
  </Target>
</Project>

I see FileMatcher invoked only once - with filespec of "**/*.cs" and one exclude filespec of "**/somefile.*". The redundant matching against the Remove pattern must have been eliminated since the issue was filed. Opportunistically closing, @cdmihai please reopen if I misunderstood.

cdmihai commented 3 years ago

@ladipro My bad, I didn't link the code where it happens: https://github.com/dotnet/msbuild/blob/13522d2466ae1634177e2a6a40fefaedff95139c/src/Build/Evaluation/LazyItemEvaluator.RemoveOperation.cs#L54 Remove operations should not match against their globs, because their globs flow backward to the previous include operations. So matching against the globs is wasted work. If the sdk has glob removes that match against many items, I think this should lead to some good measurable perf gains.

ladipro commented 3 years ago

@cdmihai thank you for following up! Makes perfect sense now.

JaynieBai commented 1 year ago

@AR-May

If the sdk has glob removes that match against many items, I think this should lead to some good measurable perf gains.

Does that mean we could write one function for example as follow to remove the match item from the include items. public void RemoveAll(string glob) // Glob is remove glob string { Regex regex = new Regex(glob, RegexOptions.IgnoreCase); _listBuilder.RemoveAll(x=>regex.IsMatch(x.Item.EvaluatedIncludeEscaped)); }

AR-May commented 1 year ago

I took a look at example from @ladipro and the code around includes and removals. It seems to me that future removes are propagated only to includes with globes (which is obviously done for better perf). It is an optimization in the FileMacther. If you include the item or the item group, the future removes are not checked and excluded. Since we are unable to track from which include the item was added this indeed could lead to double match check for remove globes. And we also could not just remove matching against globes in the remove, it would lead to bug. We might want to try applying the future removes on all other types of include together with the proposed change, similar as to how it is done for including with globes.

@JaynieBai I am not sure I understood your idea for the change. However, I suggested some analysis of my own above. Please tell me if you need any more comments on this. Note: the order of includes and removals matters, if we go for changing includes, we will need to exclude from there only future removals.