Open benvillalobos opened 2 years ago
Tagging subscribers to this area: @buyaa-n, @krwq See info in area-owners.md if you want to be subscribed.
Author: | BenVillalobos |
---|---|
Assignees: | - |
Labels: | `area-System.Xml`, `untriaged` |
Milestone: | - |
@BenVillalobos can you provide a minimal repro? Ideally something which creates a file on the fly and uses \uXYZV
notation for non-english characters
string path = @"Ⅻㄨㄩ 啊阿鼾齄丂丄狚狛狜狝﨨﨩ˊˋ˙–⿻〇㐀㐁䶴䶵.xml";
string content = @"<root></root>";
File.WriteAllText(path, content);
using XmlReader reader = XmlReader.Create(path);
and getting no exception.
What's the exception you're getting?
You'll need to include U1[]U2[]U3[]
as part of the path. Adding that and using your code in an empty console proj caused the repro for me.
@BenVillalobos what is the exception you're seeing?
from the e-mail conversation this might be blocking msbuild team so changing milestone
OH, apologies. The title for this issue was incorrect. See this comment for details: https://github.com/dotnet/msbuild/pull/6863#issuecomment-922089397
Exception gets thrown on this line https://github.com/dotnet/msbuild/blob/main/src/Tasks/XslTransformation.cs#L176
In summary, when transforming an xslt doc who's path contains the unicode mess (Ⅻㄨㄩ 啊阿鼾齄丂丄狚狛狜狝﨨﨩ˊˋ˙–⿻〇㐀㐁䶴䶵U1[]U2[]U3[]
), you'll hit an XslTransformException if that xslt calls the document
function to find another xml file next to it.
Also this isn't a blocking issue. We worked around one aspect of the issue which was loading the xslt doc in the first place. But we also see the error when trying to load another document while transforming the first.
@BenVillalobos any chance you could create a project with a tiny repro?
Steps to reproduce:
Download zip, create a folder next to proj1.csproj called Ⅻㄨㄩ 啊阿鼾齄丂丄狚狛狜狝﨨﨩ˊˋ˙–⿻〇㐀㐁䶴䶵U1[]U2[]U3[]
. Drag transform.xslt and b.xml into it.
Note: If you have the latest and greatest VS, it probably already has this change. So msbuild <pathtoproj1>.csproj
on a dev command prompt should suffice. If not, do the steps below:
git clone https://github.com/dotnet/msbuild
cd msbuild
build.cmd /p:CreateBootstrap=true
cd artifacts\bin\bootstrap\net472\MSBuild\Current\Bin
msbuild.exe <pathtoproj1.csproj>
You should get the error error MSB3703: Unable to execute transformation. An error occurred while loading document 'b.xml'. See InnerException for a complete description of the error.
. After you've confirmed the repro, debug into it to see the inner exception.
Open up MSBuild.Dev.slnf in VS.
On same command line:
set MSBUILDDEBUGONSTART=1
msbuild.exe
Place a breakpoint in the line I mentioned the exception gets thrown (2 comments above)
IMO this seems to be caused by unexpected behaviour in System.Uri
: specifically, a PUA character is not roundtripped through a file:///
URI for new Uri(x).LocalPath
.
This test program tests the Uri
constructor with various characters that have different UTF-8 encoded lengths; the last is a PUA character:
foreach (var name in new[] { "a", "\u00A1", "\u0800", "\U00010000", "\uE000" })
{
var filePath = $@"C:\Temp\{name}.txt";
var uri = new Uri(filePath);
Console.WriteLine("{0}\t{1}\t{2}\t{3}", uri.LocalPath == filePath, filePath, uri.AbsoluteUri, uri.LocalPath);
}
The output shows that the file path does round-trip for everything except the PUA character, which is double-encoded in the URI.
True C:\Temp\a.txt file:///C:/Temp/a.txt C:\Temp\a.txt
True C:\Temp\¡.txt file:///C:/Temp/%C2%A1.txt C:\Temp\¡.txt
True C:\Temp\ࠀ.txt file:///C:/Temp/%E0%A0%80.txt C:\Temp\ࠀ.txt
True C:\Temp\𐀀.txt file:///C:/Temp/%F0%90%80%80.txt C:\Temp\𐀀.txt
False C:\Temp\.txt file:///C:/Temp/%25EE%2580%2580.txt C:\Temp\%EE%80%80.txt
I haven't consulted the URI RFC to determine whether this would be considered correct. You don't get the same failure to round-trip for $@"https://example.com/{name}.txt"
so I suspect not.
If System.Uri
can't be changed in this instance (for back-compat concerns), then XmlUriResolver
and/or XmlDownloadManager
might need to work around the double-encoding?
I wonder if System.Uri
parsing for file paths goes through the IRI code path, which explicitly excludes PUA characters (as per https://datatracker.ietf.org/doc/html/rfc3987#section-2.2): https://github.com/dotnet/runtime/blob/c00b06826ca4d333ef69d51e523ef7bd309b8631/src/libraries/System.Private.Uri/src/System/IriHelper.cs#L16-L22
A fix would be to change this line of code as follows: https://github.com/dotnet/runtime/blob/c00b06826ca4d333ef69d51e523ef7bd309b8631/src/libraries/System.Private.Xml/src/System/Xml/XmlResolver.cs#L43
uri = new Uri(Path.GetFullPath(relativeUri!), new UriCreationOptions { DangerousDisablePathAndQueryCanonicalization = true });
However, I don't know the security implications of this, given the property name.
I guess, since you can pass your own XmlResolver possibly you can just derive and apply workaround in the new type. We should still figure out if this is XML or Uri issue. The Dangerous*
setting doesn't sound like a good idea but I'm not sure what are the exact implications of that or why it was created.
@karelz do we have any Uri experts who could help with the questions above?
Description
Original issue: https://github.com/dotnet/msbuild/issues/6847 MSBuild's PR: https://github.com/dotnet/msbuild/pull/6863
Initializing an XmlReader usingXmlReader.Create(<path-to-file>)
where the path contains characters from the unicode private use area causes issues at file load. Part of the path that's the culprit:Ⅻㄨㄩ 啊阿鼾齄丂丄狚狛狜狝﨨﨩ˊˋ˙–⿻〇㐀㐁䶴䶵U1[]U2[]U3[]
The workaround is to call XmlReader.Create using a StreamReader. However, when MSBuild is calling itsXslTransform
task on a file in this type of path, and in the xsl file it calls theDocument
function on a file that also exists in a path like this, it will break. See details here: https://github.com/dotnet/msbuild/pull/6863#issuecomment-922089397This comment explains the issue. You hit an exception when calling the document function in an xslt file when that xslt file is under a path that contains private area unicode characters.
Xslt file example:
Place that next to a
b.xml
in a path that containsⅫㄨㄩ 啊阿鼾齄丂丄狚狛狜狝﨨﨩ˊˋ˙–⿻〇㐀㐁䶴䶵U1[]U2[]U3[]
.Configuration
Full framework & net core 6.0.100-preview.7.21379.14
Regression?
No
Other information
See original issue / MSBuild PR for in-depth notes.