42wim / matterbridge

bridge between mattermost, IRC, gitter, xmpp, slack, discord, telegram, rocketchat, twitch, ssh-chat, zulip, whatsapp, keybase, matrix, microsoft teams, nextcloud, mumble, vk and more with REST API (mattermost not required!)
Apache License 2.0
6.62k stars 616 forks source link

Add Translation ability to bridge #647

Open patcon opened 5 years ago

patcon commented 5 years ago

Re-ticketed from: https://github.com/42wim/matterbridge/pull/512

I feel like I've learned a bit about how the app (and Go) works in the time since I first started that ticket.

Wanted to make a fresh ask, after using translation for quite some time.

My current imagining is that we're talking about a config like this:

[[gateway]]
  name="test-gateway"
  enable=true
  # main international slack
  [[gateway.inout]]
    account="slack.team-main"
    channel="test-channel"
  [[gateway.inout]]
    account="slack.team-main"
    channel="test-channel-zh"
    [gateway.inout.options]
      locale="zh"
  [[gateway.inout]]
    account="slack.team-main"
    channel="test-channel-en"
    [gateway.inout.options]
      locale="en"
  [[gateway.inout]]
    account="slack.team-main"
    channel="test-channel-ja"
    [gateway.inout.options]
      locale="ja"
  [[gateway.inout]]
    account="slack.team-main"
    channel="test-channel-ko"
    [gateway.inout.options]
      locale="ko"
  # separate country slacks
  [[gateway.inout]]
    account="slack.team-japan"
    channel="test-channel"
    [gateway.inout.options]
      locale="ja"
  [[gateway.inout]]
    account="slack.team-korea"
    channel="test-channel"
    [gateway.inout.options]
      locale="ko"
  [[gateway.inout]]
    account="slack.team-canada"
    channel="test-channel"
    [gateway.inout.options]
      locale="en"

Note that each language gets translated a few times. It would be great to iterate over the config, find all the candidate languages, and do a minimal number of API calls, then storing the raw translations in Extra. I am currently imagining a couple options, either involving serialization or not:

  1. Extras["translation_en"][0] = "lorem ipsem etc"
  2. Extras["translations"][0] = `{"en": "lorem ipsum etc"}`
    Extras["translations"][1] = `{"fr": "le lorem ipsum etc"}`
  3. Extras["translations"][0] = `{"en": "lorem ipsum etc", "fr": "le lorem ipsum etc"}`

I'm leaning toward 3, I suppose.

Any feelings on which one is better? Also, I think we need a place to store the original language. Would that be Extras["blah_original_language"][0]? And if so, it seems that storing arbitrary values will consistently require a little Get'ing and Set'ing in extras, and I'm wondering if we could add some helper functions on Message instances for making this simpler.

Thanks for any feedback before I jump into this rewrite!

poVoq commented 5 years ago

Any update on this? Also what translation APIs are supported?

patcon commented 5 years ago

No update! It's been running on 5000 member slack team for awhile, using the code linked in the issue above, but not in a state that's mergeable into mainline.

Happy to help if you'd like to rebase and take another run at it!

poVoq commented 3 years ago

Was this ever looked at again?

There are some libre machine translation APIs now, for example: https://libretranslate.com/

patcon commented 3 years ago

Unfortunately, no, not from me! but all the working code is still there in the other branch, and ready for anyone else's attention if they have time :) I'm not working in this right now, though there is still interest in this code in many communities I'm part of (just no one stepped up to maintain/improve yet)

qaisjp commented 3 years ago

In my humble opinion, I believe this sort of thing is out of scope for matterbridge.

Assuming that tengo cannot help here, I'd prefer to see a more powerful "plugin" API that would make this sort of thing implementable.

poVoq commented 3 years ago

While not totally within the original scope, I think Matterbridge would be one of the best projects to include something like that, because it already does 95% of the stuff needed.

Obviously I am not thinking about including the translation API itself within Matterbridge, but a clean way to reference an external API to do message modifications like that would be really nice.

qaisjp commented 3 years ago

Yeah. For message transformations specifically (even between bridges!) I always think that it would be nice to adopt a structure like pandoc, as pandoc manages that excellently! (If only pandoc was written in Go, then we could just import that as a library, haha.)