-
Hi Team ,
I am trying to get sentence level extraction of text along with its coordinates.
I have almost completed it except for the issue of reference markers within sentence breaking them dow…
-
This issue is in a slight relation to the conversation in #435.
I'd like to try to develop an explicit consensus on the scope of ECMA402 target API surface.
Given enough time, and using `ad extr…
-
Hello,
I have followed the guide to train a custom DeepSegment that you published on colab.research.google.com
I want to try it on Spanish, so I used a custom corpus of sentences in Spanish and a cu…
-
In issue #65, workarounds were added to make sentence splitting more accurate in light of some known issues in the pysbd sentence splitter.
In https://github.com/nipunsadvilkar/pySBD/pull/63, it se…
-
Greeting.
I am trying to use the POS tagger. I run the following code.
```
pos_tagged_interactive = pos_tagger_interactive.tag('ذهب الطالب إلى المدرسة')
print("sample POS Tagged (interactive)",p…
-
Hello, I'm trying to use the package on my `React` application. However, when I try to import `cldr-segmentation` module, I get an error saying: `FATAL ERROR: Ineffective mark-compacts near heap limit…
-
The TinySegmenter tokenizer currently being used works, but it could be better. It would be good to look for alternatives, such as ja-sentence-segmenter.
-
Thanks for your post about how to tokenize Japanese.
Currently my solution is to use `icu` tokenizer with word break iterator and customized locale as showed here:
![code](https://user-images.gith…
-
@MagedSaeed Hi, my friends . I have some questions when using the farase. I want to get your help, thanks.
sample =\
'''
يُشار إلى أن اللغة العربية يتحدثها أكثر من 422 مليون نسمة ويتوزع متحدثوها…
-
Found another example that screws up the start and end indices for the spans. I wasn't really able to reproduce it without the full text, although probably there is a smaller version of the text that …