Vocab-Apps / python-pinyin-jyutping-sentence

Convert a Chinese sentence to Pinyin or Jyutping
GNU General Public License v3.0
55 stars 5 forks source link

API- Problem with special characters #6

Closed tibochina closed 2 years ago

tibochina commented 3 years ago

First of all thank you very much for this amazing program you made. I am trying to get the pinyin through the rest API from an arduino program on a ESP32 microcontroller, but the result is unreadable, I suppose this is because of the special characters.

For example, if I make a request for the character “你”, the result is {“pinyin”: “\u00e4 \u00bd \u00a0”}

I suppose this is an url format issue. Is it possible to get the result with an ascii format(I don’t really need the tones)? Or do you have any other suggestions on how to fix this issue please?

Also as a side question, would it ever be possible to install this rest api on a esp32 microcontroller?

Thank you again in advance for you help

luc-vocab commented 3 years ago

Hi, can you share more about what you're trying to do ? why ar e you calling this API on a microcontroller ? I suspect the issue is a lack of UTF8 support.

tibochina commented 3 years ago

Hi Luc, I am making a “braille phone” with sms communication for a Chinese deaf blind friend. So I request the pinyin transliteration from the API with an ESP32 microcontroller. The tones in chinese Braille are not so important, especially for someone deaf, so I would need to get rid of the tones before translating the pinyin into braille.

Would it be ever possible to send a request to your API with the option to receive only the pinyin with spaces but without tones? In that case there would not be any special character and UTF-8 support would not be necessary. Something Like: send: 你的软件很棒! Receive: ni de ruanjian hen bang! (No tones, but spaces included)

Or do you have any other suggestions? Thank you very much in advance!

luc-vocab commented 3 years ago

I recommend you fork this API and make the changes needed to make it work on your micro controller. will the pinyin without tones be understandable ? you might want to research that before investing a lot of time. Usually going from chinese characters to pinyin lose information, losing tones would omit even more info, I suspect to the point where it might become unusable.

tibochina commented 3 years ago

Hi Luc, I will see if I can transfer the program to the microcontroller or maybe my own Linux server. If I create my own server do I need to take extra steps after installing your program or will it be all automatic? You are right about the tones. The problem is that person was deaf before becoming blind so she never spoke or heard Chinese, her language is actually Chinese sign language which doesn’t include the principle of tones, it is somewhat a different language, and written Chinese doesn’t make much sense to them. Now that she cannot see anymore she has learned very basic Braille. The communication will be only very limited to daily very simple basic communication, so the tones won’t make a difference. At least she will be able to exchange simple messages with others, as for now her only way to communicate is by touching hands. Hopefully I can add a little window to her “cell”. Thank you again for you help.

luc-vocab commented 3 years ago

Do you have a document that details the rules of chinese braille somewhere ? how will your system work together, how will the other side of the conversation enter characters ? Please provide more details. I will make the changes necessary to help you with your project. I can start by making the change to suppress the tones. But just to help the best I can, give me more details about your project. How will the braille characters be formed ?

tibochina commented 3 years ago

Awesome Luc, thank you very much!!

Let me explain more precisely the project. I have made a home brew “Braille refreshable display” with small actuators actuating little pins on demand. (It is all based on esp32) This was the biggest part of the project by far!

1- Someone sends a simple email or sms in Chinese characters 2- when a message is received the program requests the pinyin (no tones needed) “translation” from your API 3- the ESP32 c++ program “translates” the pinyin syllabes in braille format 4- the c++ program actuates the braille pins (the user get informed of new message by vibrations) 5- the user can reply by typing a Braille message (based on a 6 button keyboard) 6- the c++ program “translates” the braille into pinyin syllables 7- the c++ sends the sms/email in pinyin(without tones)

For the Braille transliteration I will use this standard: https://www.omniglot.com/chinese/braille.htm

So simply said, if it would be possible to get the pinyin without tones with word separation from your API that would be fantastic! Ideally, for longterm stability I should try to convert your program into c++ and add it directly into the ESP32 program, but I am lacking time and experience to consider doing this now. But hopefully I can try to do this later.

Thank you again for your kindness, I really do appreciate!

luc-vocab commented 3 years ago

did you try calling the API with tone numbers ?

tibochina commented 3 years ago

I tried but I wasn’t successful. Do you mind to show me an example on how to add this argument into an html GET request please?

luc-vocab commented 3 years ago

I set up a new endpoint for you:

url = 'https://apiv2.mandarincantonese.com/convert'
source = '提高口语'
response = requests.post(url, json={
    'conversion_type': 'pinyin',
    'text':source,
    'remove_tones': True
})
print(json.loads(response.content))
>>> {'romanization': 'tigao kouyu'}

You can also add spaces between every syllable:

url = 'https://apiv2.mandarincantonese.com/convert'
source = '提高口语'
response = requests.post(url, json={
    'conversion_type': 'pinyin',
    'text':source,
    'remove_tones': True,
    'spaces': True
})
print(json.loads(response.content))
>>> {'romanization': 'ti gao kou yu'}
tibochina commented 3 years ago

Hello Luc Thank you very much for creating this for my friend, it is super kind of you! And also sorry for my slow response. I have been trying to make it work but I always get:

405 {“message”: “the method is not allowed for the requested URL.”}

I had been successful to do GET requests before, but it never needed parameters, so I am lacking experience and I am most likely doing something wrong with the GET request. I tried both through C++ code in ESP32 microcontroller and I also tried with a Firefox add-on but I always get the same result with the error 405.

This is the request that I have sent: https://apiv2.mandarincantonese.com/convert?conversion_type=pinyin&text=blabla&remove_tones=True

(I also tried to change True by 1, I tried with roman letters and chinese characters in the ‘text’ parameter, but it didn’t make any differences)

Do you mind to let me know what I am doing wrong please ?

—————————————————————————————————

Another question, I am almost done with the braille machine, and hopefully my friend can keep using it for the years to come. So I can imagine that you will probably not keep your API server online indefinitely right? If so I suppose I will have to find away to install your program on a server that I host myself. Do you have any instructions on how to do this?

Thank you so much again for your kindness!

luc-vocab commented 3 years ago

You have to do a POST request as shown in the python examples, not a GET request. Show me a screenshot of your C++ ESP32 code and i'll try to steer you in the right direction. As for the service staying online, let's discuss that later after you've gotten the example working.

tibochina commented 3 years ago

Thanks Luc, sorry I didn’t pay attention it was a POST request. I managed to make it work now. This is really good thank you very much.

For your info, I have noticed 2 issues:

1- The parameter “spaces” does’t seem to have any effect. Anyway it is not an issue for me, as you suggested I can make a request with “tone_numbers”=True in that way it is more convenient to easily distinguish each characters for the braille transliteration, and still have the advantage of having the word separations for the meaning.

2- IF the text has multiple digits consecutively the pinyin transliteration is incomplete. Example: “我有34套” becomes. “Wo3 you3 san14 tao4”” Anyway once again this doesn’t affect me because my friend understands digits better that pinyin (the braille for “3” is better than “san” for her) so I will avoid sending the numbers through the API anyway. I think that whenever the text to translate will contain numbers I will cut the text around the numbers the and send it in multiple parts. For example: for “我碰到34个朋友“ I will send 2 POST requests for “我碰到” and “个朋友”.

Thank you very very much! Your program is supper helpful, and it is a real gift for my friend!

luc-vocab commented 3 years ago

OK so at this stage, what else is required for your to complete your project ?

tibochina commented 3 years ago

I need to adjust a few things about the Braille actuators on the hardware side, this might take a little while. On the software side I have now all the key elements that I needed to make the rest of the program. Of course I will need now to program the interface with buttons and merge all the different programs together, “translate” the pinyin into Braille, program the actuators to display the braille dots etc... Now it is just a matter of time, I have all what I need to keep going :)

Regarding the pinyin transliteration API, do you have any suggestions of what I should do for long term? Do you think it would ever be possible to run your program on a microcontroller like the ESP32? I am programming it with C++ but I heard it can also be programmed in micro-python, that would be awesome if I could have an ESP32 dedicated to this task, like a cheap mini server.

thanks again Luc!

luc-vocab commented 3 years ago

please email me : mandarincantonese@mailc.net and we'll discuss the long term maintenance aspect.