Closed jeremy-beasley closed 10 years ago
Unfortunately, I have absolutely no idea. I speak Japanese (crudely) and so I understand the problem space a little bit, but I'm in no way equipped to give you an answer on it. :(
Furthermore (I'm not sure how this is for Korean — correct me if I'm wrong) even word breaking in Asian languages usually requires the use of a dictionary, so if I wanted to do something similar accurately I'd need to start there.
The algorithms used here typically consider words complex by their number of syllables (it's a shortcut, but it works, broadly.) In order to do the same in korean you might have to have a Hangul lookup map with the syllabic complexity of each word... Thoughts?
In any case I don't think I'm really equipped to help you with your problem, but thanks for asking nonetheless. It's an interesting one!
Hey, Chris.
Thanks for the quick response. I suspected the points you made about (1) word breaking and (2) syllables as a proxy for word complexity. I also reached out to other computational linguists to get their POVs. Will report back what I hear. Maybe there’s a way for me to extend what you’ve already done.
Stand by.
2014/02/06 12:22、Christopher Giffard notifications@github.com のメール:
Unfortunately, I have absolutely no idea. I speak Japanese (crudely) and so I understand the problem space a little bit, but I'm in no way equipped to give you an answer on it. :(
Furthermore (I'm not sure how this is for Korean — correct me if I'm wrong) even word breaking in Asian languages usually requires the use of a dictionary, so if I wanted to do something similar accurately I'd need to start there.
The algorithms used here typically consider words complex by their number of syllables (it's a shortcut, but it works, broadly.) In order to do the same in korean you might have to have a Hangul lookup map with the syllabic complexity of each word... Thoughts?
In any case I don't think I'm really equipped to help you with your problem, but thanks for asking nonetheless. It's an interesting one!
Hi @cgiffard! Pls, text statistics supports the Spanish language?
I don't speak Spanish, I'm afraid — and I'm not sure whether any of the algorithms in question actually work with non-English languages.
Do you have any thoughts as to where to start?
No problem @cgiffard. I just confirm that SMOG formula originally developed and tested in Inglés, was also validity for texts written in Spanish and French. Thank you!
Thanks @Amandysha.
I'm closing this issue as I think supporting every language is out of scope... for now.
@jeremybeasley @Amandysha please provide documentation for such test, and ask for a Python implementation in https://github.com/shivam5992/textstat first, that way people can understand such statistics better.
Slightly tangential so apologies in advance.
Do any of you know any text statistics that work with other languages? I'm looking for modifications of these metrics that would work with Korean.
Thanks for any pointers you could give me.