korean-tokenizer Search Results

392 results
for korean-tokenizer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

apache/lucene #9828

Nori(Korean) tokenizer removes the decimal point. [LUCENE-87…

This is the same issue that I mentioned to unlike standard analyzer, nori analyzer removes the decimal point. nori tokenizer removes "." character by default. In this case, it is difficult to inde…

asfimport updated 2 years ago
17
apache/lucene #9278

Nori, a Korean analyzer based on mecab-ko-dic [LUCENE-8231]

There is a dictionary similar to IPADIC but for Korean called mecab-ko-dic: It is available under an Apache license here: https://bitbucket.org/eunjeon/mecab-ko-dic This dictionary was built with MeC…

asfimport updated 2 years ago
62
oobabooga/text-generation-webui #177

GPTQ quantization(3 or 4 bit quantization) support for LLaMa

[GPTQ](https://arxiv.org/abs/2210.17323) is currently the SOTA one shot quantization method for LLMs. GPTQ supports amazingly low 3-bit and 4-bit weight quantization. And it can be applied to LLaMa. …

qwopqwop200 updated 1 year ago
215
Babelscape/wikineural #2

Problems with the Training Japanese dataset

The error is `RuntimeError: split_with_sizes expects split_sizes to sum exactly to 1564 (input tensor's size at dimension 0), but got split_sizes=[21, 9, 18, 24, 27, 18, 36, 16, 38, 14, 24, 39, 7, 6,…

ZimingDai updated 2 years ago
3
apache/lucene #10009

KoreanTokenizer should split unknown words on digits [LUCENE…

Since #9594 the Korean tokenizer groups characters of unknown words if they belong to the same script or an inherited one. This is ok for inputs like Мoscow (with a Cyrillic М and the rest in Latin) b…

asfimport updated 2 years ago
18
huggingface/transformers #18501

Wav2Vec 2.0 model output logits related audio pad?

### System Info ubuntu 18.04 python 3.6, 3.9 transformers 1.18.0 ### Who can help? @patrickvonplaten, @anton-l ### Information - [X] The official example scripts - [ ] My own modifie…

YooSungHyun updated 2 years ago
11
apache/lucene #11452

Update Korean Dictionary for Nori [LUCENE-10416]

For Nori - Korean analyzer, there is Korean dictionary named mecab-ko-dic, which is available under an Apache license here: The dictionary hasn't been updated in Nori although it has some upd…

asfimport updated 2 years ago
11
yangheng95/PyABSA #198

Aspect term extraction & sentimental classification in Korea…

Hi, I want to apply pyABSA to Korean data, what should I modify? Do I just need to modify the configuration file after labeling the dataset? (https://github.com/yangheng95/ABSADatasets) Should I s…

seominseok48349278 updated 2 years ago
9
apache/lucene #11097

Assertion error in JapaneseTokenizer / KoreanTokenizer backt…

There is a rare case which causes an AssertionError in the backtrace step of JapaneseTokenizer that we (Amazon Product Search) found in our tests. If there is a text span of length 1024 (determined b…

asfimport updated 2 years ago
13
YongWookHa/swin-transformer-ocr #9

Tokenization strategy

Hi and thanks for the awesome repo. Did you try any other tokenization strategies (sentencepiece, wordpiece or bpe). I see you use a character level tokenization which is nice but probably dosen't mak…

IlyasMoutawwakil updated 2 years ago
2

上一页 1...19 20 21 22 23 24 25...40 下一页

392 results for korean-tokenizer

392 results
for korean-tokenizer