-
I've built the package on Windows using GnuWin.
My first issue was with the pre-trained models that were on the GitHub page, when I tried to use them to encode sentences, I got an assertion failed …
-
In Automated subject indexing (and multi label classification in general) the distribution of assigned concepts often follows Zipf's law. In our experience this leads to algorithms having low precisio…
-
NLP子任务: 分词, 实体识别 NER,文本分类,相似度判别, 机器翻译,文摘系统,事件抽,词性标注,句法分析,指代消解,语义Parse。
舆情监测系统: 文本分类,关键词(短语)抽取,实体识别,时间抽取,文本聚类,相似度判别,文本摘要。
小模型: fasttext
预训练+fintune模式
数据过于稀疏,本身过于小众:规则+解析
按照数据领域区分,针对医疗文本,定制…
-
#### Problem description
I am trying to load wiki dump and extract articles for word2vec training. This works well for more recent dumps. But for older dumps (e.g., 2010 dump), it fails.
##…
-
i want to get multiple words' vector. For instance, i have a list [ "hello", "fast", "text"] the method model.get_word_vector() (unsupervised) can only get one word's vector. How can I get the three …
-
Mainly introductory section; points to have
-
[orjson](url) is[ several times faster](https://github.com/ijl/orjson#performance) than `ujson` or `json` from the standard library, and it is a drop-in replacement.
https://github.com/hplt-project…
-
Internal user reported a stall during the .Fit() of the word embedding transform.
On first use of the word embedding transform, it downloads the word embedding model from the CDN.
To test:
1. …
-
Hi, Great work! Are there any plans to release the fasttext classifier used in the paper?
-
I successfully launched the ray.
Then I ran
```
ray attach
cd dcnlp
export PYTHONPATH=$(pwd)
screen -S processing
python3 ray_processing/process.py \
--source_ref_paths exp_data/datasets…