-
For example: h009.h05.1.6.0
-
From https://github.com/microsoft/vscode/issues/75355
## Details
From @kiranjulapalli:
I spent some more time and here are the steps to repro:
Open a vscode window
-> New file
-> ctrl+shift+…
-
Hello! I'm currently handling a dataset where the `histories` column might initially be empty, especially for users who are accessing the system for the first time.
Given this context, I'm seeking …
-
As of yet I haven't tried what happens with Chinese/Japanese characters in tokenization. Some special handling is required since these languages don't have spaces between words.
It should be relati…
-
The README states that pandas newer than or equal to 1.2.2 will work, but newer versions give the error:
File "pandas/_libs/parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_…
-
**Describe the problem to be solved**
Thai sentences don't have spaces between words. They are usually spaced between sentences, which might result in less search results being displayed than wha…
-
BPO | [9969](https://bugs.python.org/issue9969)
--- | :---
Nosy | @ncoghlan, @vstinner, @voidspace, @meadori, @takluyver, @vadmium
Files | [issue9969.patch](https://bugs.python.org/file23099/issue9969…
-
Hello, a very nice work!
I am using paradedb atm for bm25 pg search and I googled out this repo when checking whether there is an alternative implementation.
You say that creating bm25 index from ta…
-
Support tokenizing all tokens as listed in the standard from a read line.
https://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_10
Out of scope:
here doc reading and to…
-
It seems that javalang replaces unicode escapes back to the raw form (as pointed out in issue #58) in `pre_tokenize` method before tokenizing.
I don't get why this replacement is necessary (`pre_to…