Closed MarkKharitonov closed 3 years ago
@MarkKharitonov does the issue reproduce with a standard and unmodified project or did you start hitting it after manually tweaking the .csproj?
The csproj is standard SDK style project, but it references a NuGet package that generates a source file. I have reduced it to a bare minimum reproduction for your convenience. Does it answer your question?
Thank you, yes, I understand that what you posted is a minimum repro. Still wondering how common this is, i.e. is it happening with all/most projects that generate source files (designers?) or is the NuGet package you mentioned more of a corner case.
I do not know anything about how designers work with the SDK style projects. The case is pretty clear - the Hash function is sensitive to the order of items in the Compile collection, which is a problem when items are generated. When a file is generated for the very first time its order will be determined by the moment in time when the generator runs. But after the file has already been generated, its order is determined by the moment when the Compile items are evaluated. The two points in time are different and produce different ordering thus producing different Hash values. Or am I missing anything?
[Perf. Triage] Thank you for reporting this issue. We wonder how severe it actually is. Could you please describe how painful is the problem for you? How often do you hit this scenario? Would adding the generated files under the source control help?
Guys, putting generated files in the source control is not a mainstream tactic. Some do it, some do not. It is not painful for me, because I understand the root cause and know how to workaround it. But agree with me, that the workaround is not clear at all to those who do not know how msbuild works here.
All I suggest is - eliminate this nuisance. This change has low regression risk, so what is the problem? We spend more time talking about it than it would take to fix it.
[Perf Triage] This seems to be a good suggestion. However, we need to fix this in _GenerateCompileDependencyCache, not in the Hash task. Otherwise it would be a breaking change.
Why would this be a breaking change? Is the Hash task used for anything else? Just curious.
Any user of MSBuild might use the Hash task for their targets. Even if it is not specified that the order of entries matters in the task description, they might still rely on the current implementation, which is order-sensetive.
OK, in this case I would have added a flag to the task. This way you also improve the service to those other users of the Hash task, because now they have a choice they did not have in the past.
Note that you have already been there with the case sensitivity. At first this hash was case sensitive and OMG - it wrecked havoc on our builds just because somebody made a mistake in the case (easy to do on Windows). You fixed it in exactly the same way as I am proposing - added a flag.
So why chose a different path now? Just add IgnoreOrder flag. Consistent and easy.
If there are no objections I have just sent a PR to fix this. Hope it is OK.
I would like to revive this thread. What we have today:
The compiler folks give priority to the mathematical correction of the compiler and this is totally understandable. What I do not understand is why we cannot provide a way to teams to ignore the order should they purposefully desire to?
The second build recompilation may seem insignificant, but it is when you consider developers working in Visual Studio and sometimes noticing full rebuild without a reason. Not all of them would figure out that it only happens if the previous build was from scratch. The prevailing sentiment would be - our build environment is unreliable and causes devs to waste time building. I want to address it at least in our organization. Give me a way to request to ignore the order of the items in the Hash task. So just passing a flag is not enough. Let us have two flags. No way someone sets it accidentally. Let us give them ugly names. We could modify the binary logger to add a special line emphasizing the order was turned off.
After all, we do not turn off the order of the item processing by the compiler here. We just make the compilation NOT start if the order in the collection changes. For SDK style projects it only happens with auto generated files, because other than that the order is determined by the file system helper msbuild uses, which is unaffected. For non SDK style projects, the order is already in flux for non trivial projects where git merges affect it. And, of course, it would be changed once more the moment the project is migrated to SDK style. You do not suggest to keep the order of the Compile items when migrating to SDK, do you? So the real life effect of allowing us to ignore the order of items when hashing is what?
We could make it conditional on C# projects only.
Anyway, please reconsider.
What I do not understand is why we cannot provide a way to teams to ignore the order should they purposefully desire to?
We don't want to make it easy to make your build flaky. You can do this today by extending your build to sort items if you really want to.
When using build time NuGet packages that generate C# files in SDK style projects, the auto generated files are inserted into the Compile item group at different positions when comparing the very first build (actually generating these files) and subsequent builds (when the files already exist)
This should not be the case. Can you give an example where this happens? It sounds like a bug in the generation package.
I'm going to close this issue based on the discussion in the PR, but hopefully we can still help with the specific spurious-rebuild case you're seeing.
This should not be the case. Can you give an example where this happens? It sounds like a bug in the generation package.
How is this not the case? Think about it. SDK style takes all the files using glob */.cs. If the generated files already on the disk, they will be taken by the glob and will be surrounded by other files, already existing on the disk. E.g.:
Suppose you have a generation target that runs later, but before CoreCompile. It may add 3.g.cs to the <Compile>
group, but because group cannot have duplicate items, this addition does not contribute anything. The group remains as described above.
Now imagine running the build for the very first time. The glob */.cs produces the following list:
And now runs the generation target which adds 3.g.cs. The final result:
As you can see the order is different.
If you need real examples, take https://specflow.org/. Our Test Engineers use it extensively. It generates .feature.cs files from .feature files. I am sure there are gazillion other tools that generate CS code on the fly.
Now someone (but not you, because you know the answer) may ask - how come this does not happen with the files msbuild targets generate for us? Like AssemblyInfo.cs or framework moniker files? The answer is - they are generated inside the obj folder which is not covered by the glob (it is by my example, but the real code exclude obj). So the code generated by msbuild itself does not suffer from this issue.
But out there it is different - tools generate code into the source directory all the time.
And why ignore my second point? The non SDK style projects. We have some with hundreds of files. Developers touch this project files concurrently and sometimes when merging file order is changed. No one cares. It is impossible to care when the project has hundreds of files and as many developers. What developers do care is redundant compilations that happen every time they clean. First build is long - as expected. But then the second one may be long too and devs just do not understand why. Well, this is one reason why.
Now someone (but not you, because you know the answer) may ask - how come this does not happen with the files msbuild targets generate for us? Like AssemblyInfo.cs or framework moniker files? The answer is - they are generated inside the obj folder which is not covered by the glob (it is by my example, but the real code exclude obj). So the code generated by msbuild itself does not suffer from this issue.
But out there it is different - tools generate code into the source directory all the time.
This is a bug in the generator. Because of the behavior you're describing, it is wrong to generate files into the source directory, or at minimum they should be removed before they're added post-generation to avoid the reordering problem.
This is a bug in the generator. Because of the behavior you're describing, it is wrong to generate files into the source directory, or at minimum they should be removed before they're added post-generation to avoid the reordering problem.
Is it documented anywhere? The SpecFlowTest guys do not create the source files in obj. The generator we use surely does not too, because the source code it generates is something developers want to debug. It is weird to surface a C# source file from the obj directory.
And you have not addressed my claim about large the non SDK style projects in large organization. Is it really the expectation that developers maintain stable order of the files listed there? In a big R&D organization? Not on this Earth.
Issue Description
When generating source files in BeforeBuild phase extra care must be exercised to avoid redundant compilations. This extra effort (and awareness in the first place) could be spared if the Hash task was sorting the given list of items before computing the hash.
Steps to Reproduce
3 files are needed:
Program.cs
Generated.g.cs.tmpl
HelloWorld.csproj
Now let us delete the generated file and build twice:
Note the last modified timestamps of the binary:
Analysis
The root cause - the generated file is found inside the Compile item group at different locations after each build. Subsequent builds are good.
To workaround this issue we need to make sure the generated file is inserted at the same position in the list. Possible workaround:
Workaround
The code looks weird and may be easily "fixed" by somebody who does not understand the rationale behind it.
By modifying the Hash task this nuisance will be eliminated.