-
## Checklist
- [x] I have verified that the issue exists against the `main` branch of AllenNLP.
- [x] I have read the relevant section in the [contribution guide](https://github.com/allenai/al…
-
Thanks for the KELIP! If KELIP can be used with the diffusion model (like CLIP) in Korean, it will be very interesting.
I tried [CLIP Guided Diffusion](https://colab.research.google.com/drive/12a_W…
-
https://github.com/huggingface/transformers/blob/96881729ce83cfc8e5fa04c903ee4296ad17cfbb/src/transformers/models/bert/tokenization_bert.py#L117
Lately, I use bert to train a NER model for Chinese…
-
# Bug Report
## Environment
Zola version: 0.16.0
`rustc` version: 1.58.0
Cargo version: 1.58.0
## Expected Behavior
Zola 0.16.0 compiles with the `indexing-ja` feature
## Current Behavi…
-
When I read the paper and saw the model structure image, I understood that DPR and internet search can be done together.
But while looking through the code, this question came to me.
"Is it possib…
-
I wrote a Analyzer for apache lucene for analyzing sentences in Chinese language. it's called "imdict-chinese-analyzer", the project on google code is here: http://code.google.com/p/imdict-chinese-ana…
-
I’am using embeddings from example https://nlp.johnsnowlabs.com/2020/09/23/labse.html and output vectors although close, but not equal to original vectors https://tfhub.dev/google/LaBSE/1 Why?
How o…
-
Since https://issues.apache.org/jira/browse/LUCENE-8548 the Korean tokenizer groups characters of unknown words if they belong to the same script or an inherited one. This is ok for inputs like Мoscow…
-
Hi, I'm developing a tokenizer based on Korean.
Since my project is to develop a language model using SRILM's `ngram`, the role of tokenizer is very important.
I couldn't experiment because of the l…
-
I think... Nori tokenizer has one issue.
I don’t understand why “Longest-Matching” is NOT working to Nori tokenizer via config mode (config mode:
Here is an example for explaining what is longe…