alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.13k stars 1.12k forks source link

JSON formatting broken in locale with comma separator #808

Open PhilippeRo opened 2 years ago

PhilippeRo commented 2 years ago

Hello, first off, thank you for this piece of software. It runs very well and is easy to use. It seems though that I ran into a bug (sorry if I'm wrong). My locale is fr_FR and therefore floating numbers are formatted with a comma (ie 1,3 and not 1.3 as in English). Now when I set vosk_recognizer_set_max_alternatives to more than one, vosk_recognizer_result and vosk_recognizer_final_result return a json string containing several alternatives with a confidence level for each one expressed as a floating number. It turns out that this number is formatted according to the locale (with a comma) which makes the string unreadable by common json parsers since we get things like : "alternatives" : [{ "confidence" : 361,788818, "text" : "" }]

and commas are used by json to separate values. If I save the locale, change it to us_US, execute one of the above functions and change it back to what it was, it works (floating numbers are formatted with a dot). I suspect we should get the same problem with vosk_recognizer_set_words (I didn't try though). Thank you.

nshmyrev commented 2 years ago

Yeah, it might be a problem. Let me check

lfcnassif commented 2 years ago

Thanks for this great library. Just to let you know we were also affected by this issue (reference above).

nshmyrev commented 2 years ago

Oh yeah, in 20 years of C++ development they haven't come to a method to convert float to string in locale-independent way. There is std::format in c+20, but I don't want to require such a new compiler and I suppose it is not fully supported in many places. Maybe we can just force conversion from , to ..

PhilippeRo commented 2 years ago

Do you have any idea when the above fix will get pull in the tree ? I'd love to strip my current workaround from my gstreamer plugin so that it might, hopefully, get a bit quicker. Thanks!

nshmyrev commented 1 year ago

Also https://github.com/alphacep/vosk-api/pull/1055