-
The way the language ID is created is error-prone, it allows any value out of the Marc data. Perhaps the values should first be validate against ISO?
e.g.
http://id.loc.gov/vocabulary/iso639-2/deut…
-
OOV is our second big enemy. In best case, it makes context scorer harmlessly useless. Is being useless good?
Soundex and Double Metaphone matching methods are against OOV but they provide quite low …
-
I'm sorry in advance for (1) not following the issue template and (2) using incorrect language when referring to SSSOM concepts. I am new to this and still getting familiar with correct terminology et…
-
Hello,
I wonder if there is some way to count OOVs in my data. I want to evaluate coverage of my data by the fasttext model. And how can I get the words which actually exist in the model? Can I ignor…
-
Is it possible to use an external vocabulary, e.g. CMU's arpabet, with Gentle? Would be useful to be able to add oov words...
-
## Description
The current 404.html code for the CC Search Portal contains commented-out sections. Keeping unused code can make the file cluttered.This can make it harder for future developers to rea…
-
I have similar issue to #41. `meta-data.json` is not created.
Ubuntu 24.04.1 LTS
`pip show qlever`:
```
Name: qlever
Version: 0.5.6
Summary: Script for using the QLever SPARQL engine.
Home-…
-
# Background
UCO gives sets of strings that suggest values to use for certain properties. UCO has called this the "Semi-open vocabulary" design pattern: While certain strings should be used, strin…
-
- self-labeling of content
- moderator-labeling of content
- annotations vs hashtag-style labels
- having followable actors that apply labels to content (this is very likely annotations, not labels…
-
Because the advantage of subword model is that we can create the new words from pre-trained characters, I wonder how can I create a new word vector from the data.bin file. Does that .bin file contain …