Closed Bas950 closed 3 days ago
u can try new version now
Working with 50k+ lines of user input is always lovely xD
It still crashes on the following characters: | Input | Output |
---|---|---|
☆﹐﹑veevee˚ㆍ |
RangeError: Not a Hangul syllable: ㆍ, index: -31347 should not >= 11172 |
|
ㅔㅣ |
RangeError: Not a Hangul syllable: ㅔ, index: -31404 should not >= 11172 |
|
ㅑㅣㅗ므ㅛㅕㅇㅁ |
RangeError: Not a Hangul syllable: ㅑ, index: -31407 should not >= 11172 |
|
ㅇㅜㅇ |
RangeError: Not a Hangul syllable: ㅜ, index: -31396 should not >= 11172 |
|
ㅏㅏ |
RangeError: Not a Hangul syllable: ㅏ, index: -31409 should not >= 11172 |
|
Whiskers ˶˃ᆺ˂˶ |
RangeError: Not a Hangul syllable: ᆺ, index: -39494 should not >= 11172 |
For the first one, I don't know if I just need to make an if case on my side, or it's something you can do on your side.
All the other errors I got seem to be fixed in the latest version.
Heya, thanks for adding the options, I think I will be able to use stripUnSupported
.
So for your info, in my project I need to Romanize many languages, including CJK, where I use your packages for, but they shouldn't all be Romanized at once as in your @lazy-cjk/slugify
. This is because of other functions running in between and changelogs for each Romanization.
So currently I use @lazy-cjk/korean-romanize
's romanize
function for Korean, and with test the stripUnSupported
option in a bit.
For Japanese, I use @lazy-cjk/japanese
's romanize
function, but that one removes all non-japanese characters, which is not what I need.
For Chinese, I use pinyin
currently, if you have a better package for this, LET ME KNOW!
And as in your comment under the other issue, these Romanizations I need shouldn't touch any emojis or anything non-CJK. Those should stay untouched.
but they shouldn't all be Romanized at once as in your
@lazy-cjk/slugify
.
Won’t fit what I need, as I need them separately. Not CJK all at once. Cause I need to do Korean, do some changelog stuff, then Japanese, do some changelog stuff etc.
I mean its the right thing, I just need that function for each CJK separately
FYI: I have fixed most of the issues I was having by doing the following:
Korean:
import { romanize } from '@lazy-cjk/korean-romanize'
const kor = {
function: (string: string) => romanize(string, { stripUnSupported: true }),
}
Japanese:
import { hiraganaRegex, katakanaRegex, romanize } from '@lazy-cjk/japanese'
const japaneseTextRegex = new RegExp(`(?:(?:${katakanaRegex.source})|(?:${hiraganaRegex.source}))+`, 'gu')
const jpn = {
function: (text: string) => {
text = text.replace(japaneseTextRegex, (string) => {
const romanized = romanize(string)
if (romanized !== '')
return romanized
/* c8 ignore next */
return string
})
text = removeChineseJapanesePunctuation(text)
return text
},
}
Chinese:
import { pinyin } from 'pinyin'
const cmn = {
function: (string: string) => {
const romanizedWord = pinyin(string, {
style: pinyin.STYLE_NORMAL,
segment: true,
})
let newWord = ''
for (const words of romanizedWord) newWord += `${words[0]}`
newWord = removeChineseJapanesePunctuation(newWord)
return newWord.trim()
},
}
Chinese and Japanese both use this function:
import { romanizePuncutuationTable } from '@lazy-cjk/japanese'
export function removeChineseJapanesePunctuation(text: string): string {
for (const [hiragana, romanized] of Object.entries({
...romanizePuncutuationTable,
',': ',',
})) {
text = text.replaceAll(hiragana, romanized)
}
return text
}
https://github.com/bluelovers/ws-regexp/commits?author=bluelovers&since=2024-09-03&until=2024-09-03
Fixed my issues in Japanese, one little weird issue tho:
import { romanize } from '@lazy-cjk/japanese'
const result = romanize('(C)ookieッ', { ignoreUnSupported: true })
// Expected (C)ookie'
console.log(result) // (C)ōkie'
You don't get the issue when doing
import { hiraganaRegex, katakanaRegex, romanize } from '@lazy-cjk/japanese'
const japaneseTextRegex = new RegExp(`(?:(?:${katakanaRegex.source})|(?:${hiraganaRegex.source}))+`, 'gu')
const text = '(C)ookieッ';
const result = text.replace(japaneseTextRegex, string => romanize(string, { ignoreUnSupported: true }));
console.log(result) // (C)ookie'
Also did you have any suggestions for Chinese, did you have a package for that in @lazy-cjk? Or should I continue using the pinyin
npm package.
Also did you have any suggestions for Chinese, did you have a package for that in @lazy-cjk? Or should I continue using the
pinyin
npm package.
https://github.com/bluelovers/ws-regexp/blob/master/packages/%40lazy-cjk/slugify/lib/cjk/chinese.ts
Seems to be working well! Thanks for all the help!
Input:
ㄱㄱㅎ
Error:
Not a Hangul syllable: ㄱ
Expected:
k k h
org g h
or well something like this.