apache / jmeter

Apache JMeter open-source load testing tool for analyzing and measuring the performance of a variety of services
https://jmeter.apache.org/
Apache License 2.0
7.97k stars 2.02k forks source link

fix: The HTTP Header of `multipart/form-data` no longer includes `charset`. #6251

Open dongfangtianyu opened 1 month ago

dongfangtianyu commented 1 month ago

Description

close #6250 .

Motivation and Context

On some web server implementations, including charset in the request header Content-Type of multipart/form-data can result in parsing errors of the boundary, leading to a failure in sending form content.

This is inconsistent with the behavior in JMeter 5.6.2 and also does not comply with RFC.

Perhaps fixing this in the httpclient would be a more elegant choice, but:

How Has This Been Tested?

  1. A unittest case
  2. runGUI then e2e test

Screenshots (if appropriate):

After : remove charset image

Types of changes

Checklist:

dongfangtianyu commented 1 month ago

It seems that the fachbook homepage doesn't return 200, but redirects to the login page. I'm not sure if we need to modify the test case ResponseDecompression.jmx.

Debug info:

curl -v https://www.facebook.com
*   Trying 31.13.70.36:443...
* Connected to www.facebook.com (31.13.70.36) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.2 (IN), TLS header, Certificate Status (22):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS header, Finished (20):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.2 (OUT), TLS header, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_CHACHA20_POLY1305_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: C=US; ST=California; L=Menlo Park; O=Meta Platforms, Inc.; CN=*.facebook.com
*  start date: Dec 26 00:00:00 2023 GMT
*  expire date: Mar 25 23:59:59 2024 GMT
*  subjectAltName: host "www.facebook.com" matched cert's "*.facebook.com"
*  issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 High Assurance Server CA
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* Using Stream ID: 1 (easy handle 0x563b8d507b60)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
> GET / HTTP/2
> Host: www.facebook.com
> user-agent: curl/7.81.0
> accept: */*
>
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
* TLSv1.2 (IN), TLS header, Supplemental data (23):
< HTTP/2 302
< set-cookie: fr=0m99cpMhrD8XneYOy..Bl99ys..AAA.0.0.Bl99ys.AWVSv6D5HHg; expires=Sun, 16-Jun-2024 06:18:20 GMT; Max-Age=7776000; path=/; domain=.facebook.com; secure; httponly
< location: https://www.facebook.com/login/?next=https%3A%2F%2Fwww.facebook.com%2F
vlsi commented 1 month ago

I believe the core of the issue is that Apache Http Client adds the excessive encoding header: https://github.com/apache/httpcomponents-client/blob/54900db4653d7f207477e6ee40135b88e9bcf832/httpmime/src/main/java/org/apache/http/entity/mime/MultipartEntityBuilder.java#L215-L217

However, it does use the provided in multipartEntityBuilder.setCharset(charset); charset in https://github.com/apache/httpcomponents-client/blob/54900db4653d7f207477e6ee40135b88e9bcf832/httpmime/src/main/java/org/apache/http/entity/mime/MultipartEntityBuilder.java#L228 => https://github.com/apache/httpcomponents-client/blob/54900db4653d7f207477e6ee40135b88e9bcf832/httpmime/src/main/java/org/apache/http/entity/mime/HttpBrowserCompatibleMultipart.java#L69

How about fixing the issue in HTTP client instead?

As a temporary workaround, we could clone MultipartEntityBuilder into JMeter's codebase and remove the offending paramsList.add(new BasicNameValuePair("charset", charsetCopy.name()));, however, the removal of setCharset does not sound right to me as it would cause using ASCII via MIME.DEFAULT_CHARSET which is just wrong.

dongfangtianyu commented 1 month ago

the core of the issue is that Apache Http Client

I completely agree;

How about fixing the issue in HTTP client instead?

Perhaps the issues in the Http Client should be fixed within the Http Client itself, and the changes are substantial.

he removal of setCharset does not sound right to me as it would cause using ASCII via MIME.DEFAULT_CHARSET which is just wrong.

I'm not sure where the mistake lies in doing so, could you provide an example? Thank you.


If I understand correctly, the actual effect of MultipartEntityBuilder.setCharset is only on the HTTP header and does not affect the processing of the HTTP body.

For multipart/form-data, its request content encoding is unrelated to the request headers and instead resides within the various parts of the HTTP Body.

image

These UTF-8 texts work fine. image

So, according to RFC 7578, perhaps HttpClient shouldn't provide the setCharsetmethod? Or perhaps JMeter shouldn't invoke setCharset?

It turns out, before HttpClient handles the Charset issue properly, JMeter just needs to refrain from calling the setCharset method to avoid adding charset=utf-8 to the request header (although HttpClient remains at the core of the issue). This doesn't affect the correctness of the HTTP body content (there might be other impacts not mentioned, feel free to point them out).

vlsi commented 1 month ago

Or perhaps JMeter shouldn't invoke setCharset?

As I said earlier, MultipartEntityBuilder.setCharset was needed there for HttpBrowserCompatibleMultipart.java to work, as there's no other option to configure charset for HttpBrowserCompatibleMultipart mode 🤷

dongfangtianyu commented 1 month ago

Thank you, I understand your concern now.

If the setCharset method is not called, HttpBrowserCompatibleMultipart receives charsetCopy as null.

Furthermore, HttpBrowserCompatibleMultipart.charset == 'US-ASCII'.

Is this ? image

However, from the results, it still works fine:

There is no charset='US-ASCII' in the HTTP header. image

The content in the HTTP body is also UTF-8. image

I don't know the reasons behind it, nor do I know if it's stable.

Thank you again for everything you have done!