-
I'm trying to package your module as an rpm package. So I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.
- `python3 -sBm build -w --no…
-
I'd like to play with patching franc, or making some alternative to it, that can detect the language of small documents much more accurately.
First of all is this something that could be interestin…
-
- Intro: Insee -> les gens ne savent pas forcément ce que c'est ?
- 2.1: "interestingly, subsequent projects involving large datasets didn’t suffer much from this change, as their needs were actually…
-
**Describe the bug**
When using matching strategy 'all', I expect that documents where all search terms of the query match at least one searchable attribute are considered as hits.
However, it seems…
-
Currently, I am calculating the Coherence of a bertopic model using the gensim. For this I need the n_grams from each text of the corpus. Is it possible? The function used by gensim waits for the corp…
-
**Describe the bug**
`Series.str.character_ngrams(as_list=True)` resets index when it shouldn't
**Steps/Code to reproduce bug**
Consider the following code:
```
import cudf
df = cudf.DataFrame…
-
Updated Issue:
https://github.com/paradedb/paradedb/pull/276 implements a CJK tokenizer, but it doesn't seem to be working. To replicate:
```sql
CREATE TABLE tokenizer_config AS SELECT * FROM p…
ghost updated
7 months ago
-
#### Problem description
I want to calculate the Word Mover's Distance. After the normalization (`model.init_sims(replace=True)`) of my self made fastText model, the `wmdistance()` function isn't wor…
-
Quests that use the `quest_generic_otherAnswers2` display answers such as "UH…", "ER…" "OH…". To new users, it is not always obvious that these are actually words and furthermore can sometimes be mist…
-
The new wildcard data type forces the user to construct queries containing the wildcard '*' character, even when no expansion is desired. This feels counter-intuitive, especially since the wildcard ty…