-
为什么Fasttext不使用预训练的词向量,并且它的词表怎么不选用分词的,而是一个词一个词的那种?
-
I'm currently using both `Hunspell` and SymSpell as main spelling correction system. They works both ok, SymSpell works great (quality, performances, etc.) That said, I have a question about Norvig pr…
-
What kind of data were used for training LLaMa 2?
-
```python
from soynlp.vectorizer import sent_to_word_contexts_matrix
x, idx2vocab = sent_to_word_contexts_matrix(
corpus,
windows=3,
min_tf=10,
tokenizer=tokenizer, # (default)…
-
What are the current best practices for converting OntoNotes 5.0 to UD format?
I didn't find any documentation or issues about this, sorry if it was already asked.
I used [this](https://universaldep…
-
你好!能否提供源码中的几个数据文件
1、190行 string n_w_string = "../../../word2vec-41/word_vector_" + ss.str() + ".txt";
2、986行FILE\* f3 = fopen("../data/word2id.txt","r");
3、987行FILE\* f4 = fopen("../data/entityWords.t…
-
### Description
run python app.py
then:
Traceback (most recent call last):
File "/hy-tmp/kotaemon/app.py", line 13, in
from ktem.main import App # noqa
File "/hy-tmp/kotaemon/libs/…
-
#### Problem description
Trying to build my documentation with `uv` which also installs gensim as a transitive dependency.
#### Steps/code/corpus to reproduce
`uv pip install gensim`
but i…
-
### Metadata
- Authors: Wenlin Chen, David Grangier and Michael Auli
- Organization: Washington University and Facebook AI Research
- Conference: ACL 2016
- Paper: https://arxiv.org/pdf/1512.04906…
-
Using public instances.
2024.6.23+9a9ca307f
**How To Reproduce**
Search for anything in english, for example: "Searxng issue tracker"
The search results page shows language as: "Auto-detect (en-…