deedy5 / primp

🪞PRIMP (Python Requests IMPersonate). The fastest python HTTP client that can impersonate web browsers
MIT License
89 stars 7 forks source link

[Encoding] Improve the encoding detection algorithm #22

Closed deedy5 closed 4 months ago

deedy5 commented 4 months ago

response.encoding algorithm:

  1. Look for the charset in the 'Content-type' header, if not found:
  2. Look for the charset inside the html (<meta charset="..."> tag), if not found:
  3. UTF-8
  4. Update encoding based on detected encoding after decoding