-
This post relates to the effort to harmonize the Ancient Greek treebanks, as per [Issue 7](https://github.com/unipv-larl/UD4HL/issues/7).
One of the first issues to solve is tokenization itself. Th…
-
Here are a list of investigations arisen out of https://github.com/elastic/elasticsearch/pull/82870
- How should "strip_accents" in BERT style wordpiece treat umlauts and diaeresis? https://github.…
-
Hi! First of all thanks for your work in building this library!
We're just in the first steps of integrating adyen and have a working version for the web so far. We wanted to integrate this now als…
-
Hi,
I have written custom converter which converts json properties into objects specified in reloadable interface.
my json convertor
```
public class JsonConvertor implements Converter{
…
-
We will need to have more clever methods of tokenization in femtoGPT. Possibly, it's good to have a SentencePiece model reader.
-
D:\asdasd\AI\GPT-SoVITS-Server-main\GPT-SoVITS-Server-main>python server.py
DirectML可用,将使用DirectML进行推理加速。
设备名称: NVIDIA GeForce GTX 1650
Traceback (most recent call last):
File "D:\asdasd\AI\GPT-…
-
Similar to servo/servo#1009
The first step is speculative parsing concurrent with scripts, [similar to what Gecko does](https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading).
-
There seems to be some inconsistency in the original tokenization as well as the gold. I mainly found these in sports results:
In train for example, the original text looks like: "Na de 2-0 overwin…
-
Traceback (most recent call last):
File "/mnt/d/ai/RLHF/test.py", line 3, in
tokenizer = AutoTokenizer.from_pretrained("/mnt/d/ai/pretrain_models/pangu", trust_remote_code=True)
File "/hom…
-
I'm doing a NER project and trying to use BERT. For BERT, it uses wordpiece tokenization, which means one word may break into several pieces. Then for NER, how to find the corresponding class label fo…