-
This is an error case not to forget that causes some trouble with the sentence segmentation.
The document is not CC-BY, referenced here: https://dx.doi.org/10.1063/1.1874292
Here the `delinquent`…
-
It would be neat to add some sort of sentence segmentation to the query time text analysis to trigger a break in tagging. For example (a very silly one!) the input document text is:
" I want to buy s…
-
first as on https://ichi.moe
- https://readevalprint.tumblr.com/post/97467849358/who-needs-graph-theory-anyway
- https://github.com/tshatrov/ichiran
or https://jisho.org
-
Great job! I have a small question: I want to avoid catastrophic forgetting or the ability to handle bilingualism, such as training both Chinese and English simultaneously. Can the language be set to …
-
We will need to be able to extract all sentences that use the word *Galaxy* from an input document. This implies that we are able to split an input document on sentence boundaries.
NLTK will be su…
-
[ ] I checked the [documentation](https://docs.ragas.io/) and related resources and couldn't find an answer to my question.
**Your Question**
faithfulness_score: always be nan
**Code Examples**…
-
Ex. 1: (filename: `What the Panama Papers Reveal About the Art Market - The New York Times.txt`)
`Q: All necessary financial disclosures were made at the time of sale." The International Consortium of…
-
NLTK version 3.8.2 changed the data format of the tokenizers from pickle to text files in order to patch a vulnerability (CVE-2024-39705).
Here's the PR in the nltk repo:
https://github.com/nltk/n…
-
Apostrophes `ʼ` are not parsed correctly - sometimes they appear in pairs to mark quotations. The second apostrophe usually gets assigned to the following sentence and if there is none (-> end of chap…
-
This just popped up for me for the first time. Running the `recognize` function (with whisper.cpp, built with OpenBLAS, on CPU) on what is, as far as I know, not a pathological audio sample (it's an a…