-
Hi,
I understand that there are reasons why we only want to do indexing once, since there are corpus-level statistics that need to be calculated.
But is there any way to index a huge batch of do…
-
Change the DB update script to use the new YAML files in https://github.com/Helsinki-NLP/OPUS.
For example:
* https://github.com/Helsinki-NLP/OPUS/blob/main/corpus/RF/v1/info.yaml
* https://github…
-
**Is your feature request related to a problem? Please describe.**
In the old Sentence Collector, we could see how many sentences are waiting for us to approve/reject. Now, we don't know how many are…
-
**Is your feature request related to a problem? Please describe.**
Text-corpus generation is the most important and troublesome part of the dataset and many language communities are failing to extend…
-
We've setup ClusterFuzz on GCP and ran a few fuzz jobs but Testcases, Corpora, Fuzzer Statistics pages are empty.
I've checked logs and there are no errors when accessing these pages.
Confirmed cor…
-
**Debugging checklist**
[ ] Have you updated to latest MFA version? Yes
[ ] Have you tried rerunning the command with the `--clean` flag? Yes
**Describe the issue**
A clear and concise descrip…
wwdok updated
2 years ago
-
**Is your feature request related to a problem? Please describe.**
Information overflow is ubiquitous in email communication, indicated by the Inbox and other folders containing more messages than ca…
-
@VictorDenisov you brought up that using the expected content of each column could benefit the column finding algorithm. It would be nice if the column content was customizable, so this program could…
-
This is a valid stream created in KSQL from existing JSON data - note the column `3ALPHA`:
```
ksql> DESCRIBE CORPUS_RAW;
Name : CORPUS_RAW
Field | Type
----------------…
rmoff updated
5 years ago
-
### Description
Tangentially related to: https://github.com/apache/lucene/issues/13158
But, I have observed, that as the corpus reaches a fairly large size, the actual quantiles aren't changing mu…