-
I may have misunderstood the intent with the section under **Advance Usage / Tokenizers** (https://textblob.readthedocs.org/en/dev/advanced_usage.html#advanced) but I cannot get my passed in tokenizer…
-
### Describe the issue
I compared inference on GPU of a native torch Helsinki-NLP/opus-mt-fr-en model with respect to the optimized onnx model thanks to Optimum library.
When load testing the mode…
-
```
python3 /opt/NeMo/scripts/checkpoint_converters/convert_llava_hf_to_nemo.py \
--input_name_or_path llava-hf/llava-1.5-7b-hf \
--output_path /workspace/checkpoints/llava-7b.nemo \
--tok…
-
We should add a real treebuilder and lol-html like api on top of that to this crate. As part of that we need to move the tokenizer into a submodule and rework the readme
-
After digging into a few issues will building out some tests I learned that the tokenizer was working differently than I expected.
After a quick look at the docs, it looks like `natural supports a …
-
## Summary
The macro support for phrases needs some improvement.
Most importantly, phrases need a proper regular expression based tokenizer/scanner,
The scanner should automatically identify valid,…
-
I'm getting the following exception thrown when the HTML element contains the XMLNS attribute (in XHTML document):
```
Unhandled Exception: System.ArgumentException: The namespace declaration attribu…
-
As in another libraries, detokenization is a wanted feature like at https://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.tokenizer.detokenizing.
Do you have plans on supporting th…
-
**Please Describe The Problem To Be Solved**
(Replace This Text: Please present a concise description of the problem to be addressed by this feature request. Please be clear what parts of the problem…
-
**Summary**
Currently the generated axtree content for retrieved websites incurs a huge amount of tokens and cost.
Maybe below combination of Playwright with BeautifulSoup can save tokens, cost an…