-
```
Hi,
I think there is a bug in the java lexer (
modules/com.threecrickets.jygments/src/com/
threecrickets/jygments/contrib/JavaLexer.json )
To reproduce:
1. Ssee the attached sample java file
2…
-
Compiler::Lexer 0.19
```
$ perl -Mblib t/recursive_tokenize.t
*** glibc detected *** perl: malloc(): memory corruption (fast): 0x0a7759d8 ***
```
-
Possible easy solution for #2935 and #2945
The reason we forked `html5lib` to make `html5lib-modern` was because there is no new replacement for `html5lib` that provides the same XML-based HTML-tok…
-
看过tf的tokenizer的代码,输入的是句子或者单个char,返回的是单个句子或者单个char
而torch的输入输入的是句子或者单个char,返回的是单个句子list或者单个char的list
重要的问题是,如果输入的单个char本身是unk类型的字符,pytorch的tokenizer.tokenize(char) 居然返回的为空而不是[UNK]? 好奇pytorch为啥这样搞,这…
-
Would be good to document `from_array`'s use of `tokenize` and what that means in terms of calls to `__dask_tokenize__` and `normalize_token`. In particular how do these influence the `name` used in g…
-
### Description of the issue
**Description:** When running the [Jujutsu](https://github.com/martinvonz/jj) test suite, I'm seeing a flaky test I would like to skip. I tried using the following comman…
-
Hello! It's me again. (2117)
I'm here to report a linting error in one of Python's main files: tokenize.py.
The error is shown below:
Thonny has somehow escaped the single quote! This gigantic ac…
-
I tried
```
from nltk.tokenize import word_tokenize
a="g, a, b, c, 123, g32,12 123121 {1}"
word_tokenize(a)
```
**Output I am getting:**
['g', ',', 'a', ',', 'b', ',', 'c', ',', '123', ',', 'g…
-
Hi @benoitgaudou ! :wave:
Since https://github.com/gama-platform/gama/commit/7f328e8d1183cfa9098187dd9a1bc1090ee1b011 @AlexisDrogoul removed from StringUtils the `tokenize` function.
Therefore…
-
Thanks for your work!!
As shown in example.py, caption is under tensor format. So, do I need to create my own transformer-like model to transform a text format caption into a tensor format?
## Upvo…