-
Many APIs need to check input for being NFC, and it's something that can be done pretty fast for a majority of cases.
Ref: http://www.unicode.org/reports/tr15/#Detecting_Normalization_Forms
-
Ulrich:
>The SentencePiece tokenizer should probably be trained with a custom normalization table (see the SentencePiece documentation) that removes soft hyphens in addition to the existing normaliza…
-
https://wicg.github.io/native-file-system/#dom-filesystemhandle-name is a USVString and can only represent a sequence of Unicode scalar values. But file systems don't respect those rules:
- On Linux …
-
Per @petermr's suggestion in https://github.com/jsvine/pdfplumber/discussions/904#discussioncomment-6149469, I think it's a good idea to add such a parameter/option, using `unicodedata.normalize(...)`…
-
### Describe the proposed feature
`HocrTransform.normalize_text` normalizes text using the NFKC[^1] compatibilty algorithm.
https://github.com/ocrmypdf/OCRmyPDF/blob/6895c2d70fa03ec4d57e779110e07…
-
```
Passlib currently takes in whatever unicode sequence is offered, and hashes it.
However, there unicode normalization issues, non-printing code points (eg SHY)
that should be discarded, and many …
-
`(´・ω・`)` is incorrectly normalized - it results in the following:
![image](https://user-images.githubusercontent.com/12946050/84580844-3bbe6d00-addb-11ea-875b-22ef8767fb52.png)
For some reason it …
-
There seems to be a grapheme handling ambiguity for strings containing the "windows-style newline" `\r\n`?
Since `\r\n` is treated as a single grapheme by the Unicode segmentation crate, the highli…
-
I've just learned about #891 and I'm excited to see that the TOML specification is improving Unicode support.
Do I understand right that this changeset makes no recommendations for implementers whe…
-
Today the petition was spammed yet again with hateful language with 2,5K petition items added by the same person, as it appears. They've circumvented validation using unicode characters.
Perhaps th…
bvpav updated
7 months ago