Call-for-Code-for-Racial-Justice / Open-Sentencing

To help public defenders better serve their clients, Open Sentencing shows racial bias in data such as demographics providing insights for each case
Apache License 2.0
73 stars 18 forks source link

Internationalization (I18N) enablement #15

Open amfred opened 3 years ago

amfred commented 3 years ago

We woud like to be able to translate what users see in the future. Enabling the repos for translation is step 1 of that.

RickPoleshuck commented 3 years ago

This seems like a good first project. Does Github let you add dependencies? My wife might help with Spanish language translations, but that issue seems dependent on this one.

amfred commented 3 years ago

That's great that we might have a Spanish translator!

You're right - we at least need to get all of the English strings in one file first. Issue #16 Then it's not hard to create another copy of the file with the Spanish in it.
The exact file format depends on the translation library we choose; still open to suggestions.

Heh. Github Enterprise+Zenhub (which is what I'm used to) lets us add dependencies, but I don't see how to do it here. Anyone else know how to do that?

amfred commented 3 years ago

By the way, the intent of this issue #15 is to choose the open source translation library. Then #16 is to put the English strings into the format that the library needs. And #17 is for a translation of that file into Spanish (or any other language we can get).

amfred commented 3 years ago

We might well have to choose different translation libraries/plug-ins for different repos because they use different programming languages, but if we choose one for the UI to start with that uses a standard file format, maybe we can at least use the same file format for the other repos.

RickPoleshuck commented 3 years ago

I have been working with jHipster https://www.jhipster.tech/. One of the things I like about the framework is that they have made good choices on which technologies to use. Does it make sense to create a slack channel to discuss #15?

On 3/2/21 4:39 PM, Ann Marie Fred wrote:

By the way, the intent of this issue #15 https://github.com/Call-for-Code-for-Racial-Justice/Open-Sentencing/issues/15 is to choose the open source translation library. Then #16 https://github.com/Call-for-Code-for-Racial-Justice/Open-Sentencing/issues/16 is to put the English strings into the format that the library needs. And #17 https://github.com/Call-for-Code-for-Racial-Justice/Open-Sentencing/issues/17 is for a translation of that file into Spanish (or any other language we can get).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Call-for-Code-for-Racial-Justice/Open-Sentencing/issues/15#issuecomment-789275537, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQAZHWNONMHU2LRNXDPQJTTBVSK3ANCNFSM4WZGQF6Q.

RickPoleshuck commented 3 years ago

I am very new. Excuse me if I ask stupid questions. Why would you have translation in anything other than the UI? Legal documents might need to be translated, but that isn't part of the code base.

On 3/2/21 4:45 PM, Ann Marie Fred wrote:

We might well have to choose different translation libraries/plug-ins for different repos because they use different programming languages, but if we choose one for the UI to start with that uses a standard file format, maybe we can at least use the same file format for the other repos.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Call-for-Code-for-Racial-Justice/Open-Sentencing/issues/15#issuecomment-789278233, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQAZHRBZME3IBXXZZLDMHDTBVTA5ANCNFSM4WZGQF6Q.

blueivywave commented 3 years ago

It sounds like the intent of this task is to provide content for multiple languages. Correct?

If that is the case, what APIs are being considered for this, or are being evaluated?

amfred commented 3 years ago

The lower level components like the Bias Detection Engine might end up generating text too, so I could see us passing a language code to them and getting back translated text. It's fine to start with the UI first since we know that needs it for sure.

amfred commented 3 years ago

It sounds like the intent of this task is to provide content for multiple languages. Correct?

If that is the case, what APIs are being considered for this, or are being evaluated?

You mean like Google translate?

RickPoleshuck commented 3 years ago

Again, I am very new, but since you asked...

I would use @ngx-translate. As I said earlier, I like jHipster's choices.

Google translate might be useful on the backend.

On 3/2/21 4:56 PM, Ann Marie Fred wrote:

It sounds like the intent of this task is to provide content for
multiple languages. Correct?

If that is the case, what APIs are being considered for this, or
are being evaluated?

You mean like Google translate?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Call-for-Code-for-Racial-Justice/Open-Sentencing/issues/15#issuecomment-789283399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQAZHXJBMVWZ33LW7HCCL3TBVUJFANCNFSM4WZGQF6Q.

blueivywave commented 3 years ago

We might well have to choose different translation libraries/plug-ins for different repos because they use different programming languages, but if we choose one for the UI to start with that uses a standard file format, maybe we can at least use the same file format for the other repos.

Have you looked into IBM's Language Translator? It instantly translates web content. Here is the link: https://www.ibm.com/watson/services/language-translator/

RickPoleshuck commented 3 years ago

This is from an Angular project I am toying with:     html:

        Game

    i18n:

        "game": {             "title": "Game",             "createLabel": "Create a new Game",

I believe that this is the type of translation needed for the front end UI. This kind of translation is all hard coded in various languages and very efficient.

Translations on the back end that need to be done dynamically need a different solution. The IBM language translator might be the optimum choice for that.

On 3/2/21 5:17 PM, blueivywave wrote:

We might well have to choose different translation
libraries/plug-ins for different repos because they use different
programming languages, but if we choose one for the UI to start
with that uses a standard file format, maybe we can at least use
the same file format for the other repos.

Have you looked into IBM's Language Translator? It instantly translates web content. Here is the link: https://www.ibm.com/watson/services/language-translator/ https://www.ibm.com/watson/services/language-translator/

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Call-for-Code-for-Racial-Justice/Open-Sentencing/issues/15#issuecomment-789293412, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQAZHRAFRBQHQQ3P2XMCYDTBVWZLANCNFSM4WZGQF6Q.

amfred commented 3 years ago

I recently learned that Angular has a built-in translation framework, so I think we should start there with the UI, because it's built in Angular. https://angular.io/guide/i18n

Each language has a separate file in xml/json format containing the translated strings. If we use json as our file format, maybe there's also a json translation framework that would work with our Java/Maven Aggregator, for example.

RickPoleshuck commented 3 years ago

This article, https://phrase.com/blog/posts/best-libraries-for-angular-i18n/ , says that the Angular builtin has caught up with NGX-Translate. I am happy with either.

RickPoleshuck commented 3 years ago

I would get started with the implementation, but I haven't fully set up my development environment. authentication.service.ts and client_id is still confusing to me. Does it make sense for me to ask for help?

carleyreardon3 commented 3 years ago

Not sure if you are still looking for ideas on this, or if this would even be something you consider, but if you considered using machine translation for this and wanted to make sure the model works well on the type of language you're using (legal language, etc), you could try fine tuning a machine translation model on a set of english sentences and a small set of manually translated ones. HuggingFace has a ton of machine translation models that are fairly easy to implement -- here is the link for one that translates English to Arabic (for example): https://huggingface.co/Helsinki-NLP/opus-mt-en-ar

github-actions[bot] commented 3 years ago

:wave: Hi! This issue has been marked stale due to inactivity. If no further activity occurs, it will automatically be closed.

demilolu commented 3 years ago

@carleyreardon3 we're still looking at this. we're also doing some translation for our five fifths voter project. For that we're looking at Watson Natural Language Translator, but we're also thinking about how to crowdsource human translators/reviewers as well. For Hugging Face, i forget is it via API, I think the issue with BERT or similar transformer models is the computation required.

We might need a unified view of how we deal with translation across all our projects :). cc @upkarlidder

github-actions[bot] commented 2 years ago

:wave: Hi! This issue has been marked stale due to inactivity. If no further activity occurs, it will automatically be closed.