Open patcon opened 6 years ago
Well, it's an interesting idea. But right now, I don't have time to work on this myself. If you know/have someone who can work on this, feel free to open a PR and I can give some pointers if necessary.
Ok, started investigating how a proof-of-concept might work.
Eager to hear your feedback! Thanks in advance for any attention.
(As I'm sure you recall, I'm not a golang programmers, so this might be messy, but happy to take a stab at it 😃 )
This seems like the appropriate way that we might strive to configure:
# config.toml
[[gateway]]
name="g0v-tw.translation"
enable=true
translate=true
[[gateway.inout]]
account="slack.g0v-tw"
channel="general"
[gateway.inout.options]
locale="zh-TW"
[[gateway.inout]]
account="slack.g0v-tw"
channel="general-en"
[gateway.inout.options]
locale="en"
Could forego translate
toggle, and simply assume that a locale and google translate key mean that incoming messages should be translated. (I prefer this approach tbh.)
I would then add code either to:
doTranslation()
executed within handleReceive()
of gateway/router.go
, or
https://github.com/42wim/matterbridge/blob/296428d53e4febb5a82082d3c61628fbd396fd13/gateway/gateway.go#L394-L416modifyMessage()
of gateway/gateway.go
https://github.com/42wim/matterbridge/blob/296428d53e4febb5a82082d3c61628fbd396fd13/gateway/router.go#L71-L111In order to translate messages, we would use the following API endpoint:
POST https://translation.googleapis.com/language/translate/v2
Params:
text
, which preserves newlines. (html
is the default, and should be used when we want to start preserving markdown formatting, as mentioned below in "Transformation" section.)[gateway.inout.options]
of receiving bridge.[gateway.inout.options]
of outgoing bridge.Since Google Translate goes a little overboard, we'll want to mark some features of messages as non-translatable. Specifically @usernames
, #channels
, urls, and `code snippets`
. We'll do this by regex'ing for each, and wrapping them in <span translate="no"></span>
tags. These tags will remain in the translated strings, but can be processed out again.
A proof-of-concept may also need to generally strip markdown in order for Google Translate to work well. We can subsequently (after PoC) bring this back by converting markdown into HTML (which Google Translate can handle), and then back to markdown.
Apparently, we must also add "Powered by Google Translate" to messages.
Looks good! Calling a new function in modifyMessage
would be the best.
You'll also need an API key
key, for the google translate endpoint.
Thanks for vetting and encouragement :) ~modifyMessage
already exists, but I'll just choose any old name for now and happy to change later~
Calling a new function in modifyMessage would be the best.
Ok, after digging around a bit, I'm confused by this suggestion, and was hoping you could help me understand :)
I appears that modifyMessage
in handleRecieve
is what is called based on gateway-level config that applies to every message that comes out of the gateway, and it makes changes that apply to all destination channels. It seems to be handleMessage
that is called for each of the potential many generated messages going into other channels.
Is this correct? If so, then the latter is where the new functionality must be added, because each message must be translated differently based on the Locale
settings of the channel it's being dropped into.
Any clarification of my understanding is appreciated! Thanks @42wim!
Yes, your suggestion is correct. (sorry for sending you in the wrong direction)
Yay! Bare working PoC!
This still needs some work, but I just wanted to say that this is WORKING SO WELL! I can already feel it slightly changing how our community is able to communicate, and it's pretty neat!
Thank you so so so SO much for this tool @42wim :)))
Looks pretty cool, good job!
Hey @42wim! Hopefully a quick question:
I'm running into a bit of trouble with the fact that I'm totally transforming the text, and I don't think this was the original intention. It seems that msg.Text
is being passed between all bridges, and not as separate instances of the original message. This is unexpected, as I'd like each channel to receive a translation of the original post in the original language. But it keeps mutating.
It took me awhile to notice this, as Google Translate only cares about the target language, and auto-detects the origin language, whatever it may be.
So assume I have a gateway with 4 rooms and 4 langauges: english, korean, chinese, and japanese. It seems that handleMessage()
for the first bridge might transform the Text from english to chinese, then that chinese text is transformed into korea, then korea into japanese. This tends to garble the original message, as it send it through 3 layers of translation.
Does it seem that my understanding is correct? If so, can you think of a way around this? (My first thought would be to stay the original message on a value in the Gateway struct, but that's probably wrong.)
As always, thanks for any assistance! :)
Nevermind. Figured it out! Hadn't seen the origmsg
var 🤦♂️
Describe the solution you'd like I am interested in whether matterbridge could be used to help unfragment international communities for which language barriers make collaboration difficult. It would be wonderful if matterbridge could be used to create translatable gateways between tools, or even between different channels within the same tool.
So for example, the
g0v-tw
Slack could bridge it's#general
channel withgeneral-en
, and messages could be translated into each "ends" target language.I can imagine a generalized feature request being for "transformation of messages within gateways in a directional fashion". But the specific request would be to discuss allowing translation via Google Translate's API.
I could imagine this involving a pluggable system that also powered the reformatting of markup in between platforms.
Describe alternatives you've considered Forking the project.
Additional context None.
Thanks for your consideration!