subword-segmentation Search Results

152 results
for subword-segmentation

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Helsinki-NLP/OPUS-MT-train #46

Request for EN-PL model

It would be great to see EN-PL model!

djstrong updated 11 months ago
8
mozilla/translations #469

OpusTrainer can produce incorrect alignments, breaking stude…

I've found at least one bug in the implementation: https://github.com/hplt-project/OpusTrainer/issues/53

gregtatum updated 7 months ago
2
NVIDIA/Megatron-LM #600

[BUG] Unittests for NLP require data on internal CI machines…

**Describe the bug** A lot of unit tests in NLP collection (over 10) require correct version of ``/home/TestData`` folder (from internal CI machines) to be present to run successfully. **This make…

okuchaiev updated 11 months ago
1
google/sentencepiece #917

Skipping numbers in tokenization

@taku910 Hi Team , how to stop mdoel tokenize numbers ? i tried `split_by_number =False ` and `split_by_digit = False ` but still number isbeing tokenized into multiple digits Example I…

kpriyankavya updated 1 year ago
1
huggingface/transformers #26254

Access to pre_tokenizer for PreTrainedTokenizer

### Feature request Give access to setting a `pre_tokenizer` for a `transformers.PreTrainedTokenizer`, similar to how this works for `PreTrainedTokenizerFast`. ### Motivation As far as I un…

GitMew updated 1 year ago
4
PyThaiNLP/pythainlp #769

bug: IndexError when romanizing

### Description When running `pythainlp.romanize("ไกรฤกษ์ โชติวุฒิวินิจ")`, throws an `IndexError` ### Expected results Should return something like Krairiksh Chotiwutwinit (approximately) …

maddyobrienjones updated 1 year ago
7
huggingface/transformers #24612

ValueError: An instance of tokenizer class BioGptTokenizer c…

### System Info ValueError: An instance of tokenizer class BioGptTokenizer cannot be converted in a Fast tokenizer instance. No converter was found. I am using microsoft/biogpt for token classifi…

TekeshwarHirwani updated 1 year ago
8
alexpovel/betterletter #33

Rewrite (in Rust)

_Obviously_ has to be in Rust, as we desperately need to be trendy. Jokes aside, it'd be a good opportunity to enhance the tool further: - [x] single binary is nice, users don't need to install a P…

alexpovel updated 1 year ago
6
drprojects/DeepViewAgg #28

The problem about AttributeError: 'int' object has no attrib…

I encountered the following problem while training the 3D point cloud model: ```bash [2023-07-20 08:48:30,841][torch_points3d.datasets.base_dataset][INFO] - Available stage selection datasets: ['…

Tommydied updated 1 year ago
8
ggerganov/llama.cpp #167

Differences with the llama tokenizer

In this case the llama.cpp and the llama tokenizers produce different output: ``` main: prompt: 'This is 🦙.cpp' main: number of tokens in prompt = 10 1 -> '' 4013 -> 'This' 338 -> ' …

slaren updated 1 year ago
19

上一页 1...3 4 5 6 7 8 9...16 下一页

152 results for subword-segmentation

152 results
for subword-segmentation