Ousret/charset_normalizer (charset-normalizer)
### [`v3.3.2`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#332-2023-10-31)
[Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2)
##### Fixed
- Unintentional memory usage regression when using large payload that match several encoding ([#376](https://togithub.com/Ousret/charset_normalizer/issues/376))
- Regression on some detection case showcased in the documentation ([#371](https://togithub.com/Ousret/charset_normalizer/issues/371))
##### Added
- Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife)
### [`v3.3.1`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#331-2023-10-22)
[Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1)
##### Changed
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
- Improved the general detection reliability based on reports from the community
### [`v3.3.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#330-2023-09-30)
[Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0)
##### Added
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias ([#323](https://togithub.com/Ousret/charset_normalizer/issues/323))
##### Removed
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant
##### Changed
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7
##### Fixed
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \__lt\_\_ ([#350](https://togithub.com/Ousret/charset_normalizer/issues/350))
### [`v3.2.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#320-2023-06-07)
[Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0)
##### Changed
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
- Minor improvement over the global detection reliability
##### Added
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
- Explicit support for Python 3.12
##### Fixed
- Edge case detection failure where a file would contain 'very-long' camel cased word (Issue [#289](https://togithub.com/Ousret/charset_normalizer/issues/289))
### [`v3.1.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#310-2023-03-06)
[Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0)
##### Added
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR [#262](https://togithub.com/Ousret/charset_normalizer/issues/262))
##### Removed
- Support for Python 3.6 (PR [#260](https://togithub.com/Ousret/charset_normalizer/issues/260))
##### Changed
- Optional speedup provided by mypy/c 1.0.1
### [`v3.0.1`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#301-2022-11-18)
[Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1)
##### Fixed
- Multi-bytes cutter/chunk generator did not always cut correctly (PR [#233](https://togithub.com/Ousret/charset_normalizer/issues/233))
##### Changed
- Speedup provided by mypy/c 0.990 on Python >= 3.7
### [`v3.0.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#300-2022-10-20)
[Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0)
##### Added
- Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
- Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
- Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio
- `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl)
##### Changed
- Build with static metadata using 'build' frontend
- Make the language detection stricter
- Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1
##### Fixed
- CLI with opt --normalize fail when using full path for files
- TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
- Sphinx warnings when generating the documentation
##### Removed
- Coherence detector no longer return 'Simple English' instead return 'English'
- Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
- Breaking: Method `first()` and `best()` from CharsetMatch
- UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
- Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
- Breaking: Top-level function `normalize`
- Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch
- Support for the backport `unicodedata2`
Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
[ ] If you want to rebase/retry this PR, check this box
This PR has been generated by Mend Renovate. View repository job log here.
This PR contains the following updates:
==2.1.1
->==3.3.2
Release Notes
Ousret/charset_normalizer (charset-normalizer)
### [`v3.3.2`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#332-2023-10-31) [Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.3.1...3.3.2) ##### Fixed - Unintentional memory usage regression when using large payload that match several encoding ([#376](https://togithub.com/Ousret/charset_normalizer/issues/376)) - Regression on some detection case showcased in the documentation ([#371](https://togithub.com/Ousret/charset_normalizer/issues/371)) ##### Added - Noise (md) probe that identify malformed arabic representation due to the presence of letters in isolated form (credit to my wife) ### [`v3.3.1`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#331-2023-10-22) [Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.3.0...3.3.1) ##### Changed - Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8 - Improved the general detection reliability based on reports from the community ### [`v3.3.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#330-2023-09-30) [Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.2.0...3.3.0) ##### Added - Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer` - Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias ([#323](https://togithub.com/Ousret/charset_normalizer/issues/323)) ##### Removed - (internal) Redundant utils.is_ascii function and unused function is_private_use_only - (internal) charset_normalizer.assets is moved inside charset_normalizer.constant ##### Changed - (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection - Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7 ##### Fixed - Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \__lt\_\_ ([#350](https://togithub.com/Ousret/charset_normalizer/issues/350)) ### [`v3.2.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#320-2023-06-07) [Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.1.0...3.2.0) ##### Changed - Typehint for function `from_path` no longer enforce `PathLike` as its first argument - Minor improvement over the global detection reliability ##### Added - Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries - Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True) - Explicit support for Python 3.12 ##### Fixed - Edge case detection failure where a file would contain 'very-long' camel cased word (Issue [#289](https://togithub.com/Ousret/charset_normalizer/issues/289)) ### [`v3.1.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#310-2023-03-06) [Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.0.1...3.1.0) ##### Added - Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors (PR [#262](https://togithub.com/Ousret/charset_normalizer/issues/262)) ##### Removed - Support for Python 3.6 (PR [#260](https://togithub.com/Ousret/charset_normalizer/issues/260)) ##### Changed - Optional speedup provided by mypy/c 1.0.1 ### [`v3.0.1`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#301-2022-11-18) [Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/3.0.0...3.0.1) ##### Fixed - Multi-bytes cutter/chunk generator did not always cut correctly (PR [#233](https://togithub.com/Ousret/charset_normalizer/issues/233)) ##### Changed - Speedup provided by mypy/c 0.990 on Python >= 3.7 ### [`v3.0.0`](https://togithub.com/Ousret/charset_normalizer/blob/HEAD/CHANGELOG.md#300-2022-10-20) [Compare Source](https://togithub.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) ##### Added - Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results - Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES - Add parameter `language_threshold` in `from_bytes`, `from_path` and `from_fp` to adjust the minimum expected coherence ratio - `normalizer --version` now specify if current version provide extra speedup (meaning mypyc compilation whl) ##### Changed - Build with static metadata using 'build' frontend - Make the language detection stricter - Optional: Module `md.py` can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1 ##### Fixed - CLI with opt --normalize fail when using full path for files - TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it - Sphinx warnings when generating the documentation ##### Removed - Coherence detector no longer return 'Simple English' instead return 'English' - Coherence detector no longer return 'Classical Chinese' instead return 'Chinese' - Breaking: Method `first()` and `best()` from CharsetMatch - UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII) - Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches - Breaking: Top-level function `normalize` - Breaking: Properties `chaos_secondary_pass`, `coherence_non_latin` and `w_counter` from CharsetMatch - Support for the backport `unicodedata2`Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Mend Renovate. View repository job log here.