-
If I send in "17 júní" the tokenizer returns 17. júní". Even though I use tokenized() (and not split_itsentences()) and use the txt-property (which should contain the original source text for the toke…
-
```racket
(check-expect (ab-hgraph->hfile
(ab-hgraph (root (list (list (leaf (token "**kern" EXCLUSIVE-INTERPRETATION 0))
(parent (token "…
-
Test:
```racket
(check-expect (hfile->ab-hgraph
(path->hfile "../../data/order/spine-splits-left-joins-left.krn") ab-hgraph)
(ab-hgraph (root (list (list (leaf (toke…
-
During a Jira indexing:
```
Traceback (most recent call last):
File "/app/danswer/background/indexing/run_indexing.py", line 219, in _run_indexing
new_docs, total_batch_chunks = indexing_p…
-
As requested by a member of the community, it would be cool to implement a new feature for splitting documents using an LLM nstead of our current token or delimiter-based methods. This will allow for …
-
Can SPLIT_TOKEN be renewed in `// EXTENSION BUILDER DEFAULTS END TOKEN ...`, because the PHP-CS-Fixer with the default configuration constantly changes it to `//# EXTENSION BUILDER DEFAULTS END TOKEN…
-
**Github username:** @DevPelz
**Twitter username:** Pelz_Dev
**Submission hash (on-chain):** 0x0a8dc280a702a0c8da1997b764f9fdca0a2604de9044b5f7459f30435dea8e8e
**Severity:** high
**Description:**
**…
-
In this [zimit run](https://farm.zimit.kiwix.org/pipeline/3c4a7789-47f0-4341-997a-d103dc292993/debug), it looks like the worker failed to run the scraper container.
It's unclear why but it looks like …
-
**Github username:** --
**Twitter username:** --
**Submission hash (on-chain):** 0xc71a1d98f00c557a83d211a217a5e66cc16d3acb4db412bb0a069c472273587e
**Severity:** high
**Description:**
## Description…
-
Right now, `StringParser`'s implementation is at the character level, so if you give it a special token as the target string, it can possibly generate the same string but with non-special tokens. If a…
aw632 updated
1 month ago