-
The ngrams concatenating function `c_ngrams()` used for preprocessing the corpus takes more or less 30 minutes on my computer to run. This is very slow. Can it be made faster?
-
MoreLikeThis is something we use massively. I am confident there is a way to replicate this with Rubix ML (and even have a learning system), but I am wondering about a suggested approach.
Related l…
-
```
# Google Services
60.199.175.53 google.com
60.199.175.88
203.208.36.18 www.google.com
203.208.36.18 www.google.com.hk
203.208.36.18 www.l.google.com
203.208.36.18 www2.l.google.com
203.208.…
-
```
# Google Services
60.199.175.53 google.com
60.199.175.88
203.208.36.18 www.google.com
203.208.36.18 www.google.com.hk
203.208.36.18 www.l.google.com
203.208.36.18 www2.l.google.com
203.208.…
-
Need ability to discover n-grams that commonly occur with other n-grams.
i.e. How would I discover "tion"
e.g.
"ion"
"tio"
nation => nat ati tio ion
vacation => vac aca cat tio ion
station…
-
A log of things we try.
Robot_1 vs Robot_2 comparisons using different models.
Trying:
```
time vw -d ~/NA12878_V2.5_Robot_1.open_w16.hhga.gz --binary --passes 20 -q ha --ngram a5 -c -b 26 -f ~/ng…
-
I would like to do an openalex query for papers (works) while filtering for a list of specific journals. I can fetch the info for `entity = sources` with no problem:
``` r
library(openalexR)
jo…
-
Appreciate your efforts on this excellent work!
I found that there's a ngram encoding step is processed before hashing in minhash_spark.py. If the length of a doc is below the min_length, then it wil…
-
```
# Google Services
60.199.175.53 google.com
60.199.175.88
203.208.36.18 www.google.com
203.208.36.18 www.google.com.hk
203.208.36.18 www.l.google.com
203.208.36.18 www2.l.google.com
203.208.…
-
```
# Google Services
60.199.175.53 google.com
60.199.175.88
203.208.36.18 www.google.com
203.208.36.18 www.google.com.hk
203.208.36.18 www.l.google.com
203.208.36.18 www2.l.google.com
203.208.…