-
Dear grobid team,
I hope you are good and healthy. I'll jump straight to the problem.
**INFO**
version_used: docker image grobid/grobid:0.7.0
**PROBLEM**
For several pdfs the python g…
-
Standardize `Intl.v8BreakIterator`.
Backpointers:
- https://github.com/nodejs/node/issues/3111
- https://bugs.chromium.org/p/v8/issues/detail?id=3785
Update 1 (Sept 26th, 2016):
- Proposal from @lit…
-
After discussing with ICU4X teams and experts from ICU, Markus suggested we should investigate a bit more on implementing the rule-based break iterator by using the approach in ICU4C. [Quote from his …
-
Hi,
I want to retrieve text by searching for an audio using [AudioClip](https://github.com/AndreyGuzhov/AudioCLIP) model.
First, I created indexing of **text** (car-horn, coughing, alarm-clock, …
-
Language : python
In this script, will create an application using tkinter GUI,
In this application user will be able to segments the paragraphs into multiple sentence.
This sentence segmen…
-
We need the following Unicode properties:
- Grapheme_Cluster_Break
- Sentence_Break
- Word_Break
- Extended_Pictographic for [GB11](https://www.unicode.org/reports/tr29/#GB11)
-
---
Title: Sentence Segmenter
About: It's an GUI script
Name: Akash Ramanand Rajak
Label: Feature Request
Assignee: ''
---
Define You:
- [x] LGM-SOC'21 Participant
- [ ] DevIncept Part…
-
Detailed steps to produce a corpus for 1867 to 2019:
A tagged corpus for the period 1900 to 2019 already exists (created using Sparv v4.1). The raw text files for 1867 to 1899 are also already avai…
-
Hi! I am trying the Jina examples from [here](https://github.com/jina-ai/examples/tree/master/multires-lyrics-search) with my own data. I'm new to jina. So sorry if this is a trivia query.
I have a…
-
## Feature description
Add zh models trained on OntoNotes v5 Chinese.
## Could the feature be a [custom component](https://spacy.io/usage/processing-pipelines#custom-components) or [spaCy plugin](…