dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.99k stars 4.66k forks source link

HttpClient Coding problem #23761

Closed Petermarcu closed 4 years ago

Petermarcu commented 6 years ago

@tianmaoyu commented on Mon Jul 31 2017

HttpClient Coding problem

Problems with Chinese Fonts

General

var client = new HttpClient();
var multipartContent = new MultipartFormDataContent("-------boundary");
var fileStream2 = File.Open(@"H:\开源的项目\新建文本文档.txt", FileMode.Open);
var streamContent2 = new StreamContent(fileStream2);
multipartContent.Add(streamContent2, "files", "新建文本文档.txt");
var output = new MemoryStream();
await multipartContent.CopyToAsync(output);
output.Seek(0, SeekOrigin.Begin);
string result = new StreamReader(output).ReadToEnd();
return result;

result string:

filename="=?utf-8?B?5paw5bu65paH5pys5paH5qGjLnR4dA==?="
filename*=utf-8''%E6%96%B0%E5%BB%BA%E6%96%87%E6%9C%AC%E6%96%87%E6%A1%A3.txt
---------boundary
Content-Disposition: form-data; name=files; filename="=?utf-8?B?5paw5bu65paH5pys5paH5qGjLnR4dA==?="; filename*=utf-8''%E6%96%B0%E5%BB%BA%E6%96%87%E6%9C%AC%E6%96%87%E6%A1%A3.txt

������
---------boundary--

@tianmaoyu commented on Mon Jul 31 2017

---------boundary
Content-Disposition: form-data; name=files; filename="=?utf-8?B?5paw5bu65paH5pys5paH5qGjLnR4dA==?="; filename*=utf-8''%E6%96%B0%E5%BB%BA%E6%96%87%E6%9C%AC%E6%96%87%E6%A1%A3.txt

������
---------boundary--

@karelz commented on Mon Jul 31 2017

What is the key problem in your repro? Is it the Unicode file name? Is it the content of the file? Can you please clarify what is expected vs. current output?

@tianmaoyu commented on Mon Jul 31 2017

I want to use httpclient to upload files. When the name of the file is Chinese, the garbled code appears. I want to be able to upload Chinese files correctly. For example: filename=“新建文本文档.txt“ instead of: filename="=?utf-8?B?5paw5bu65paH5pys5paH5qGjLnR4dA==?="

@tianmaoyu commented on Mon Jul 31 2017

I want to encode the Content.header with UTF-8 and calculate the length instead of using DefaultHttpEncoding (Encoding.GetEncoding (28591)). The source code is: System.Net.Http.HttpRuleParser

 internal static readonly Encoding DefaultHttpEncoding = Encoding.GetEncoding(28591);

[EDIT] Changed code formatting for readability by @karelz

davidsh commented 6 years ago

We also need to see if this coding pattern above works or not on .NET Framework. Not sure what the HTTP RFC's say about MultiPart content using Unicode filenames.

Priya91 commented 6 years ago

The rfc https://tools.ietf.org/html/rfc6266 states that when both filename and filename is present, filename should be given precedence and used to retrieve the information. You should be able to decode the encoding based on the the rules.

Also this program gives same output on .NET Framework and .NET Core,

    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine(TestMethod().GetAwaiter().GetResult());
        }

        public static async Task<string> TestMethod()
        {
            var client = new HttpClient();
            var multipartContent = new MultipartFormDataContent("-------boundary");
            var fileStream2 = File.Open(@"E:\开源的项目新建文本文档.txt", FileMode.Open);
            var streamContent2 = new StreamContent(fileStream2);
            multipartContent.Add(streamContent2, "files", "开源的项目新建文本文档.txt");
            var output = new MemoryStream();
            await multipartContent.CopyToAsync(output);
            output.Seek(0, SeekOrigin.Begin);
            string result = new StreamReader(output).ReadToEnd();
            return result;
        }
    }
---------boundary
Content-Disposition: form-data; name=files; filename="=?utf-8?B?5byA5rqQ55qE6aG555uu5paw5bu65paH5pys5paH5qGjLnR4dA==?="; filename*=utf-8''%E5%BC%80%E6%BA%90%E7%9A%84%E9%A1%B9%E7%9B%AE%E6%96%B0%E5%BB%BA%E6%96%87%E6%9C%AC%E6%96%87%E6%A1%A3.txt

hello world.
---------boundary--

on .net core

---------boundary
Content-Disposition: form-data; name=files; filename="=?utf-8?B?5byA5rqQ55qE6aG555uu5paw5bu65paH5pys5paH5qGjLnR4dA==?="; filename*=utf-8''%E5%BC%80%E6%BA%90%E7%9A%84%E9%A1%B9%E7%9B%AE%E6%96%B0%E5%BB%BA%E6%96%87%E6%9C%AC%E6%96%87%E6%A1%A3.txt

hello world.
---------boundary--
Priya91 commented 6 years ago

This is by-design.

karelz commented 6 years ago

@caesar1995 you just looked at similar parts of the RFCs. Can you please check if there is anything here for us to do?

caesar-chen commented 6 years ago

Nothing here is left for us to do.

If the file name of the sender's operating system is not in US-ASCII, both .NET Core and Framework will provide both:

  1. "encoded-word" (stored in filename parameter), and
  2. the string encoded using the method of RFC 2231 (stored in filename* parameter, and I think this is what you are looking for).

As Priya has mentioned with an example above,

The rfc https://tools.ietf.org/html/rfc6266 states that when both filename and filename is present, filename should be given precedence and used to retrieve the information. You should be able to decode the encoding based on the the rules.

If you have a repro that the information cannot be retrieved from filename*, I can help to look into it.

caesar-chen commented 6 years ago

There's no issue with filename*.

Glad it's working. It's unclear to me that what's your previous complain is about, because we did implement the encoding in RFC 5987, and store encoded value in filename*.

"encoded-word" should never be encoded in filename.

I agree, it's clear from RFC 6266 (apologies we didn't update our product earlier). Probably it's due to some historical/app-compat reasons in Framework.

It is "filename" field where you send garbage.

Is there a specific reason you need this field when filename is present? Quote RFC 6266: when both "filename" and "filename" are present in a single header field value, recipients SHOULD pick "filename*" and ignore "filename".

Thanks for your input! We will try our best to make .NET Core better.

caesar-chen commented 6 years ago

Sorry to hear that :(. If you believe there is an bug with .Net Core, and have a small working repro (before we start a fix, we will need to understand the impact for the issue), please don't hesitate to open a issue here. We can help to look into it, and answer the questions to save developers' time.

essen commented 5 years ago

I think there's some confusion with regard to the content-disposition header.

https://tools.ietf.org/html/rfc6266 defines the content-disposition header for use in HTTP. This is not the same header as the one used with multipart, as can be seen in the introduction:

      Note: This document does not apply to Content-Disposition header
      fields appearing in payload bodies transmitted over HTTP, such as
      when using the media type "multipart/form-data" ([RFC2388]).

There is a specific RFC for multipart/form-data, and it has a note about the use of the filename* parameter here https://tools.ietf.org/html/rfc7578#section-4.2 which says:

   NOTE: The encoding method described in [RFC5987], which would add a
   "filename*" parameter to the Content-Disposition header field, MUST
   NOT be used.

Finally, both RFC5987 and its most recent revision https://tools.ietf.org/html/rfc8187#section-1 say in their introduction:

      Note: This encoding does not apply to message payloads transmitted
      over HTTP, such as when using the media type "multipart/form-data"
      ([RFC7578]).

Ultimately it's up to you, but I think it's making interoperability more difficult than it should be.

karelz commented 5 years ago

I admit I have hard time to follow all RFC blurbs - let's be practical: Is there a repro which would show that things are working incorrectly?

essen commented 5 years ago

I wouldn't say things work incorrectly in your code as far as I can tell (beyond sending a filename* that really shouldn't be there according to the RFC). Just wanted to do a follow-up after I closed a related ticket on my end (as an HTTP server developer).

The issue originally reported to me was about how the filename value is encoded differently depending on implementations, as documented in the RFC at https://tools.ietf.org/html/rfc7578#section-5.1.3 Your code uses the encoded-word method, which I don't support (but users can decode it themselves so no biggie).

It's a historical encoding that has been removed from the specification as can be seen in the first paragraph of https://tools.ietf.org/html/rfc7578#appendix-A But that doesn't make it wrong or incorrect per se.

Cheers.

idilov commented 5 years ago

Please reconsider adding support for UTF-8 encoded headers. I'd like to remind the original post:

I want to encode the header with UTF-8 ... instead of using DefaultHttpEncoding The source code is: System.Net.Http.HttpRuleParser internal static readonly Encoding DefaultHttpEncoding = Encoding.GetEncoding(28591);

I also want to second all the comments posted by @imgrey.

We should not act as lawyers. In summary, you pointed out that HttpClient complies with RFCs and that "encoded-word" format is used by-design. All that is true but why did you close the issue? In reality, browsers can upload files and HttpClient cannot.

Example: Some servers don't respect the filename*(star) override, yes they are non-complient, BUT they can decode UTF-8 headers. Specifically, I am unable to upload files with cyrillic names to a Java-based server using .net core and HttpClient. Their own Angular client however works fine.

People complain about this and use terrifying workarounds:

Please allow the developers to choose the encoding for HTTP headers.

karelz commented 5 years ago

@idilov please note we do not monitor or track closed issues. You wonder why it has been closed - see above it says "by design". It is fine to disagree and provide counter points.

Your reply is very end-to-end (which is great), but not really easy to understand. It also asks for API changes I believe, which is entirely different ask than the original post. I would recommend to file a new issue and describe the problem from scratch, with samples, list of couple of servers that support what you say, etc. ... Don't assume any previous knowledge by people who will read it. That will be best way to get some traction on this. Thanks!