-
# Guest lecture @ UNC Charlotte: Labeling with LLMs
A few weeks ago, I held a guest lecture at University of North Carolina Charlotte on how we can use large language models for annotation in the con…
-
## Description
I'm trying to use Typesense with my content in Thai. What's special is that Thai (and a few other languages) doesn't use spaces to separate words. Typesense seems to care about that.…
artt updated
2 months ago
-
## Defect Report
I use NotoSansKhmer and uharfbuzz together with fpdf2 to create a pdf document. To right align the text I need the width of a string after being adjusted by the font shaping engine…
-
### Description
For a local dependency with custom non optional group, dependencies listed do not get installed when installing main project.
#### here's the local package `znlp_translate`
#####…
-
I am trying to split Burmese Unicode characters in stringr::str_split() but not return the correct values.
`str_split("စမ်းသပ်မှု", "")[[1]]`
it returns:
> [1] "စ" "မ်" "း" "သ" "ပ်" "မှု"
…
-
## Problem
KOReader use [libunibreak](https://github.com/adah1972/libunibreak) find location to break lines.
However, it doesn't support breaking SEA languages: Thai, Burmese, Lao, Khmer outlined in…
-
Dataloader name: `parallel_asian_treebank/parallel_asian_treebank.py`
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?parallel_asian_treebank
| Dataset| parallel_asian_tr…
-
Dataloader name: `bactrian_x/bactrian_x.py`
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?bactrian_x
| Dataset| bactrian_x |
|-------------|---|
| Description | The B…
-
While paradigmatic data can be modeled somewhat using pure `Wordlist`s, this means that a lot of information may get lost (or at least be only informally added) - and in particular not be readily avai…
-
# Welcome to the Common Voice Community !
> Common Voice aims to make speech technology accessible to everyone by building an open sourced dataset of labelled voice data that is representative of l…
ksoky updated
2 years ago