dgtlmoon / changedetection.io

The best and simplest free open source web page change detection, website watcher, restock monitor and notification service. Restock Monitor, change detection. Designed for simplicity - Simply monitor which websites had a text change for free. Free Open source web page change detection, Website defacement monitoring, Price change notification
https://changedetection.io
Apache License 2.0
17.3k stars 965 forks source link

HTTP "POST" request with UTF-8 non latin [feature] #1315

Closed Churator closed 2 months ago

Churator commented 1 year ago

I'm trying to post a request with UTF-8 chars failing because latin-1 is used couldn't find where to change it

'latin-1' codec can't encode characters in position 57-63: Body ('בדיקה') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

dgtlmoon commented 1 year ago

can you paste the full HTTP request settings here?

Churator commented 1 year ago

Sure

Url : https://www.bezeq.co.il/umbraco/api/FormWebApi/CheckAddress

Method: POST

Data: {"CityId":"1111","StreetId":"1111","House":"11111","Street":"בדיקה","City":"בדיקה","Entrance":""}

dgtlmoon commented 1 year ago

thanks, I can confirm this one.

leiless commented 3 months ago

I'm having the same issue here.

$ docker exec -it changedetection_io_app_1 bash
$ python3 -c "import requests; r = requests.post('http://httpbin.org/post', data='你好')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/usr/local/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/usr/local/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/urllib3/connectionpool.py", line 416, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.10/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.10/http/client.py", line 1328, in _send_request
    body = _encode(body, 'body')
  File "/usr/local/lib/python3.10/http/client.py", line 166, in _encode
    raise UnicodeEncodeError(
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: Body ('你好') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

https://stackoverflow.com/questions/55887958/what-is-the-default-encoding-when-python-requests-post-data-is-string-type/56120372#56120372

--

If body is a string, it is encoded as ISO-8859-1, the default for HTTP. If it is a bytes-like object, the bytes are sent as is. If it is a file object, the contents of the file is sent; this file object should support at least the read() method.

ISO-8859-1 is well known as latin-1.

https://docs.python.org/3/library/http.client.html#http.client.HTTPConnection.request

leiless commented 3 months ago

Possible solution

https://github.com/dgtlmoon/changedetection.io/blob/0.45.24/changedetectionio/content_fetchers/requests.py#L49

        r = requests.request(method=request_method,
-                            data=request_body,
+                            data=request_body.encode('utf-8') if type(request_body) is str else request_body,
                             url=url,
                             headers=request_headers,
                             timeout=timeout,
                             proxies=proxies,
                             verify=False)
dgtlmoon commented 3 months ago

@leiless isnt this a duplicate of https://github.com/dgtlmoon/changedetection.io/issues/2309 ?

If you are using JSON for your posts:// - Make sure you are using | tojson when building your json message, this should encode anything non-ascii and bypass this error. For example, it will turn the smiley ツ into \u30c4

leiless commented 3 months ago

@dgtlmoon No, it's not, I'm using the Basic fast Plaintext/HTTP Client POST with body (Chinese chars encoded in UTF-8).

https://github.com/dgtlmoon/changedetection.io/issues/2309 is all about deliver notification with UTF-8 chars.