-
From e.g. https://github.com/google/oss-fuzz/runs/5309713893?check_suite_focus=true
```
=================================== FAILURES ===================================
______________ CoverageRep…
-
Hello,
I've been trying to utilize the DL4J word2vec spark implementation in our project. The current sample data is a million lines of text extracted from Wikipedia. While I was able to successful…
-
Getting this issue when updating corpus. Tried `ebooks auth` to get a new token, but no dice.
```
> ebooks archive markbao corpus/markbao.json
Currently 3248 tweets for markbao
/home/mark/.rvm/gems/r…
-
#### Issue Description
I use WordVectorSerializer.readWord2VecModel to read model which is saved by WordVectorSerializer.writeWord2VecModel, then I got an exception.
WordVectorSerializer.readWord2…
ghost updated
3 years ago
-
Is there a way to retrain the syntaxnet POS tagger model with new dataset?
-
-
python train_gpu.py --corpus_info_path=G:/XLNetData/tftest/corpus_info.json --record_info_dir="G:/XLNetData/tftest/tfrecords" --model_dir="" --train_batch_size=8 --seq_len=128 --reuse_len=64 --mem_len…
-
Hi,
I encountered a strange problem when reproducing the baseline using BM25 as the retrieval method.
Firstly, I used the dataset `wiki-18.jsonl`, which is downloaded from your [huggingface da…
-
In #20892, I re-opened an issue, only to have @gopherbot re-close it. This is an issue to follow-up on what happened there. cc @andybons @kevinburke
-
I'm trying to make a python script that computes the probabilities and backoffs similarly to `kenLM`.
The goal is to reproduce the same outputs, given the same corpus.
However, no matter how muc…