Closed hockyy closed 10 months ago
let pyshell = new PythonShell(cantoneseScriptAppDataPath);
ipcMain.handle('tokenizeUsingPyCantonese', async (event, sentence) => {
pyshell.send(sentence);
return new Promise((resolve, reject) => {
pyshell.once('message', function (message) {
resolve(JSON.parse(message));
});
});
});
import json
import pycantonese
import re
import os
os.environ["PYTHONUTF8"] = "1"
def separate_jyutping(jyutping_string):
# Regular expression to match Jyutping syllables
pattern = re.compile(r'([a-z]+\d)')
return pattern.findall(jyutping_string)
# Initialize PyCantonese
def generate_json(sentence):
# Parse the sentence for POS and Jyutping
# print(sentence)
parsed_sentence = pycantonese.parse_text(sentence)
# print(parsed_sentence)
# Initialize the result list
result = []
# Loop through each word and its details
for word in parsed_sentence[0].tokens():
separation_dict = [{"main" : word.word, "jyutping": word.jyutping}]
if(word.jyutping):
separation = separate_jyutping(word.jyutping)
if(len(separation) == len(word.word)):
separation_dict = [{"main" : word.word[i], "jyutping": separation[i]} for i in range(len(separation))]
word_dict = {
"origin": word.word,
"pos": word.pos,
"jyutping": word.jyutping,
"separation" : separation_dict
}
result.append(word_dict)
# Convert the result to JSON format
return json.dumps(result, ensure_ascii=False)
if __name__ == "__main__":
while True:
sentence = input()
jsonRes = generate_json(sentence)
print(jsonRes)
My expectations to windows are low but this is holy ..
你能在这里发布错误吗?
你能在这里发布错误吗?
happy new year
No error, but when i received the message event, it shows symbol like this
Im at macau right now i will post you the screenshot when i get back to my apartment
the message was just full of that symbol, seeming the shell somehow uses local encoding method.
Everything works perfect in mac and deb based linux
Read over https://docs.python.org/3/library/os.html#utf8-mode. In the bottom it says "The Python UTF-8 Mode can only be enabled at the Python startup". You're trying to enable it inside Python, but at that point Python has already started up. Instead you can enable UTF-8 mode before python startup using python-shell.
For example, save these python and typescript files to the same directory and try running index.ts
. You can do so directly with ts-node
.
# test.py
print('香港人')
// index.ts
import {PythonShell} from 'python-shell';
let options = {
pythonOptions: ['-X', 'utf8'],
};
PythonShell.run('test.py', options).then(messages=>{
// results is an array consisting of messages collected during execution
console.log('results: %j', messages);
});
>ts-node index.ts
results: ["香港人"]
I also suggest reading over an article I wrote recently, https://medium.com/@almenon214/learn-unicode-in-y-minutes-60a8b2cef1d9. Let me know if it helps out :)
@Almenon it works! hahah amazing
棒棒! Let me know when you add mandarin support to your app. I would be interested in testing it out.
lol haha @Almenon Im too lazy for it, not gonna do it in a near future, but surely will happen
there is this one shit OS called "Windows 11" which wouldn't let my shell communicate with Chinese characters (Hanzi), I'm at this point to lazy to support my app further for this certain OS 😭
Setting the encoding option doesnt help too, adding
os.environ["PYTHONUTF8"] = "1"
doesn't help too. I give up.Windows 11 Node v18.15.0 Python3 3.11.3