-
**Describe the bug**
I was testing the search by adding Japanese documents, but when I added a Kanji-only document, past Japanese documents were no longer caught in the search.
**To Reproduce**
…
-
See: https://github.com/pierrelegall/whatlangex
#130
-
![image](https://user-images.githubusercontent.com/55869557/174332717-2196986f-fd6d-4bdd-8f98-19b7b29dc798.png)
-
Whatlang introduced new Languages and Scripts in the newer version.
We should upgrade our dependency to the latest version.
-
-
Make the alphabet score calculation implemented in https://github.com/greyblake/whatlang-rs/pull/108
generic and reuse same implementation for Cyrillic.
Ensure there is sufficient unit test coverage…
-
In progress by the community here:
- https://github.com/meilisearch/tokenizer/pull/49
- https://github.com/meilisearch/tokenizer/pull/70
If there is no answer from the community, the work should …
-
Only seeking for certain character sets cannot determine languages in Alphabets.
For example, "Je m'applle Sunghyun" and "I am Sunghyun" cannot be differentiated only with their character sets.
-
Today Charabia detects automatically the Language of the provided text choosing the best tokenization pipeline in consequence.
#### drawback
Sometimes the detection is not accurate, mainly when th…
-
First of all - thank you for sharing this useful library with the world!
I was wondering if you'd be open to adding version tags on git/GitHub along with releases of new versions on pypi - this wou…