dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.96k stars 4.02k forks source link

MSBuild properties to analyzer config break in presence of newlines #43970

Open chsienki opened 4 years ago

chsienki commented 4 years ago

We transform MSBuild properties to global analyzer configs via the GenerateMSBuildAnalyzerConfig task.

In MSBuild it's permissible for properites to contain newlines, which transfers into the resulting config, breaking downstream parsing.

For example:

<PropertyGroup>
  <PropWithNewLine>this
is 
a
valid = property
  </PropWithNewLine>
</PropertyGroup>

would result in

is_global = true
msbuild_property.PropWithNewLine = this
is 
a
valid = property

This is true for semicolons too. These are common in msbuild as its the default combine character for a list

For example:

<ItemGroup>
  <Item1 Include="a" />
  <Item1 Include="b" />
  <Item1 Include="c" />
</ItemGroup>
<PropertyGroup>
  <Item1Combined>@(Item1)</Item1Combined>
</PropertyGroup>

Will result in

build_property.Item1Combined = a;b;c

In editorconfig ; is the comment character, meaning only a will be passed to the consumer.

chsienki commented 4 years ago

I think the simplest solution would be to transform the values via something well-known, and special case transforming them back inside the compiler.

My current thinking is to base64encode the values in the config file, and special case build_{property,metadata} parsing to decode them.

AArnott commented 3 years ago

base64 will be a significant obfuscation, which will make some build failure investigations much harder (like the one that led me to file a dupe bug of this in the first place). If instead you applied URI-encoding to any .editorconfig-reserved character as msbuild itself does to xml-reserved chars, then most values will appear in plaintext, and only those characters that require special treatment will be obfuscated.

Sergio0694 commented 1 year ago

Just hit this in #64917. The only workaround is splitting with , and keeping everything on one line, which is extremely clunky to say the least unfortunately. Is this issue on the working set and/or does it have an approximate milestone? Would you accept contributions for this? 🙂

viceroypenguin commented 1 year ago

I have found an alternative (though not pretty). Use \uxxxx type character encodings in the string and Regex.Unescape() to decode them. This is how I'm encoding semicolons and hashes in a property that I need to include in my source generator. Simpler than trying to use alternative characters and translating them in the source generator. Doesn't help with the multi-line stuff though, since it would still have to be in a single line with \u000a separators.

Youssef1313 commented 1 year ago

The EditorConfig spec was updated so that ; and # are only comments when they are the first character of a line. I think the compiler should adhere to the updated spec, and then we are done partially done with the issue. No special encoding/escaping needed.

https://spec.editorconfig.org/#no-inline-comments

Specification was updated in https://github.com/editorconfig/specification/pull/31

Also related to https://github.com/dotnet/roslyn/issues/44596

https://github.com/dotnet/roslyn/pull/51625 will handle the ; and # cases.

The case of multiline still needs work.

aetos382 commented 10 months ago

This is a problem caused by the implementation of AnalyzerConfig.Parse method.

This method assumes that analyzer configuration is given by an EditorConfig style file. The EditorConfig format is line-oriented and does not assume that properties have multi-line values. However, analyzer configurations can also be specified in project files, in which case properties can have multi-line values.

The MSBuild documentation states that property values can also be given in XML format, but if they are multi-line, they will not be passed to analyzer correctly.

The EditorConfig specification cannot be controlled by Microsoft or .NET community. I would prefer that AnalyzerConfig.Parse use its own file format that is upward compatible with EditorConfig, rather than updating the EditorConfig specification.

Note: This does not mean that we can write analyzer configurations in the new file format. The compiler internally converts analyzer configurations from project files into the EditorConfig format and then parses it. The first step would be to update the intermediate format used in this process.

As a workaround for the moment, analyzer configurations with multi-line values could be given by AdditionalFiles for that purpose, rather than through project files.

0xced commented 1 month ago

I just stumbled on this while trying to improve EmbeddedResourceGenerator by using the embedded resource LogicalName metadata.

As already mentioned in this issue, the generated editorconfig is problematic and the last part (;.txt) will be discarded when passed to the source generator.

[~/EmbeddedResourceGenerator/EmbeddedResourceAccessGenerator.Tests/2InvalidChars-\;.txt]
build_metadata.EmbeddedResource.LogicalName = EmbeddedResourceAccessGenerator.Tests.2InvalidChars-;.txt

Is there any chance that this problem will ever be solved?