biocore / american-gut-web

The website for the American Gut Project participant portal
BSD 3-Clause "New" or "Revised" License
5 stars 24 forks source link

Pull out all text and media into localized dicts #58

Closed wasade closed 10 years ago

wasade commented 10 years ago

Any text that is displayed to the user will need to be pulled out into project/locale specific dicts. These dicts will be located under amgut.lib.locale_data, and each portal will have its own. For instance, the American Gut portal text and media will fall under amgut.lib.locale_data.american_gut. The motivation for portal specific, instead of language specific localization is that many countries rely on en_UK, but we need to play for having different branding (e.g., Australian Gut) moving forward.

Each locale_data module needs to provide two dicts, text_locale and media_locale. The structure of text_locale is as follows:

text_locale = {
    'name_of_template.html': {
        'INFORMATIVE_VARIABLE_NAME': 'The text that is displayed'
    }
}

media_locale = {'INFORMATIVE_VARIABLE_NAME': '/path/to/media.jpg'}

text_locale is per-template which minimizes what needs to be passed into render from the handlers. media_locale is not per-template as much of the media is common across templates, such as the logo.

In the __init__ for amgut.lib.locale_data, there is a media_locale that represents anything common regardless of portal. This dict is updated by the project specific media_locale.

For localization data, all handlers will need to add:

from amgut import text_locale, media_locale

The package-level __init__ will resolve the appropriate locale data. All handlers wishing to use the locale data will need to specify the locale over render. For instance:

class MyHandler(BaseHandler):
    def get(self):
        # stuff of interest
        template = 'really_awesome.html'
        self.render(template, ...whatever..., text_locale=text_locale[template], media_locale=media_locale)

Within the templates, all user visible text will need to be replaced with:

<!-- former -->
<p><strong>very</strong> informative message</p>
<!-- new style -->
<p>{% raw text_locale['INFORMATIVE_MESSAGE'] %}</p>

All variables should be expressed with raw, which will let us preserve html formatting and tags within the locale dicts.

And finally, the list of templates to knock off. A template is considered completed once the text has been pulled out and replaced with locale lookups, the relevant locale entries updated, and the handler has been updated.

ElDeveloper commented 10 years ago

This seems like an overall good strategy. I am sure you guys revised this but I am curious about it, why can we not use any of the available packages to do localization?

adamrp commented 10 years ago

Any specific suggestions? We looked briefly at the locale package in python, and although it could help with some things (such as which punctuation mark to use to denote deimals versus thousands separators), I think this is the more "complete" solution; I have never used any of those packages though, so it's very possible I am underestimating their utility.

On Wed, Sep 24, 2014 at 6:54 PM, Yoshiki Vázquez Baeza < notifications@github.com> wrote:

This seems like an overall good strategy. I am sure you guys revised this but I am curious about it, why can we not use any of the available packages to do localization?

— Reply to this email directly or view it on GitHub https://github.com/biocore/american-gut-web/issues/58#issuecomment-56760594 .

squirrelo commented 10 years ago

Just as a heads up: adding {% autoescape None %} to the top of a template means that none of the variables will be autoescaped, so raw is not needed.

squirrelo commented 10 years ago

uh, guys.... http://www.tornadoweb.org/en/branch2.2/locale.html

wasade commented 10 years ago

@squirrelo, have you ever read an automatic translation that you liked?

wasade commented 10 years ago

@squirrelo, are you sure that it is safe to have all variables treated as raw? This way we can at least be specific and it is like 3 extra characters to type per variable

squirrelo commented 10 years ago

Fair on the raw bit, but the translations actually come from a CSV file you create. Take a look at http://www.tornadoweb.org/en/branch2.2/locale.html#tornado.locale.load_translations

We still need to do the breakout of text, but in a completely different way in that case.

wasade commented 10 years ago

I don't think we want to write out all conjugated forms of all the verbs used in the AG site. My read on that method is that it is for translating, not for replacing already translated strings

On Thu, Sep 25, 2014 at 4:45 PM, Joshua Shorenstein < notifications@github.com> wrote:

Fair on the raw bit, but the translations actually come from a CSV file you create. Take a look at http://www.tornadoweb.org/en/branch2.2/locale.html#tornado.locale.load_translations

We still need to do the breakout of text, but in a completely different way in that case.

— Reply to this email directly or view it on GitHub https://github.com/biocore/american-gut-web/issues/58#issuecomment-56895169 .

wasade commented 10 years ago

btw, I'm moving on sitebase.html but probably won't finish it tonight

squirrelo commented 10 years ago

It's for both. And we don't have to worry about the translation of verbs, as "For strings with no verbs that would change on translation, simply use “unknown” or the empty string (or don’t include the column at all)." Since we are just doing direct translation it should be a straight replace operation in tornado. I'll play with it tomorrow morning and see if this will make life easier.

wasade commented 10 years ago

Awesome, sounds goos On Sep 26, 2014 12:05 AM, "Joshua Shorenstein" notifications@github.com wrote:

It's for both. And we don't have to worry about the translation of verbs, as "For strings with no verbs that would change on translation, simply use “unknown” or the empty string (or don’t include the column at all)." Since we are just doing direct translation it should be a straight replace operation in tornado. I'll play with it tomorrow morning and see if this will make life easier.

— Reply to this email directly or view it on GitHub https://github.com/biocore/american-gut-web/issues/58#issuecomment-56922979 .

squirrelo commented 10 years ago

Playing with a toy example, and tornado.locale seems to do everything we want and more. We can even set it up with variables inside the translation strings so we can insert the project name programmatically instead of directly within the translation strings.

ElDeveloper commented 10 years ago

That is awesome, we can give it a go at the full website, worst case scenario we go back to the previous plan, best case scenario we "only" need to translate strings.

On (Sep-26-14|10:02), Joshua Shorenstein wrote:

Playing with a toy example, and tornado.locale seems to do everything we want and more. We can even set it up with variables inside the translation strings so we can insert the project name programmatically instead of directly within the translation strings.


Reply to this email directly or view it on GitHub: https://github.com/biocore/american-gut-web/issues/58#issuecomment-56989663

squirrelo commented 10 years ago

Yup yup. I can demo the toy example after lunch, when I get back to the building.

wasade commented 10 years ago

Do you have an example? Should we stop the current efforts right now or is that effort still necessary?

wasade commented 10 years ago

What time? I need to leave by about 12:20 for the rest of the day btw...

squirrelo commented 10 years ago

I have an example. Unzip this and then run "python tornadolocale.py" to play with the toy. the index should stay at whatever language you were at last, while the /en/ and /de/ pages should be english and german, respectively. https://dl.dropboxusercontent.com/u/39899821/localetest.zip

squirrelo commented 10 years ago

@wasade I'd say pause current development for now and see if that toy example will work for what we need to do.

wasade commented 10 years ago

The tornado examples all are oriented around phrases, not paragraphs. From looking at the source, it looks like the translation will just be a look up of the full string (e.g., a paragraph) in to a dict to get a translated version. That seems fine, but I'm very nervous about maintenance as, moving forward, we'd need to modify the presented string (such as those currently in the templates) as well as each locale dict as it is the presented string that is the key

antgonza commented 10 years ago

What about: "present text": "texto actual" "present text": "future text" ?

wasade commented 10 years ago

Yes, that is what tornado is doing, which means that if you have N supported languages, anytime you update english text (like changing "Add Human Source" to "Add human source"), you need to make that change in N+1 locations. This will be a nightmare for long paragraphs when doing small grammatical changes

antgonza commented 10 years ago

I see, luckily we will only support N < 5, right? :)

wasade commented 10 years ago

For N=2 this is a nightmare.

What are the benefits to using tornado's locale?

wasade commented 10 years ago

@teravest just pointed out that we could have a en_USv1 look up, which lets use preserve all the current strings and essentially requires that all strings are translated. This greatly simplifies everything as the templates then can be filled with a key such as "SITEBASE_NAV_ADD_HUMAN", which is a key in the translation files. That works, but we still need to separately address branding

ElDeveloper commented 10 years ago

That sounds like a good idea. As for branding, I would be surprised if tornado didn't provide something like that already as it is a very standard problem.

On (Sep-26-14|11:02), Daniel McDonald wrote:

@teravest just pointed out that we could have a en_USv1 look up, which lets use preserve all the current strings and essentially requires that all strings are translated. This greatly simplifies everything as the templates then can be filled with a key such as "SITEBASE_NAV_ADD_HUMAN", which is a key in the translation files. That works, but we still need to separately address branding


Reply to this email directly or view it on GitHub: https://github.com/biocore/american-gut-web/issues/58#issuecomment-56997897

wasade commented 10 years ago

Agree, though we can always fall back on the replacements anyway. Hows this sound then for the translations:

Sound good? We also of course need to get the translation support hooked up in webserver.py

ElDeveloper commented 10 years ago

This sounds like a good course of action. :+1:

On (Sep-26-14|11:20), Daniel McDonald wrote:

Agree, though we can always fall back on the replacements anyway. Hows this sound then for the translations:

  • all text needs to be pulled out and replaced with `{% raw _('SOME_VARIABLE') %}
  • variable names used in templates must be unique and make sense (e.g., SITEBASE_NAV_ADD_HUMAN_SOURCE
  • en_USv1 is created keyed by the variables, valued by the present text on the website
  • google spreadsheet is updated as we planned except using the more explicit variable names

Sound good? We also of course need to get the translation support hooked up in webserver.py


Reply to this email directly or view it on GitHub: https://github.com/biocore/american-gut-web/issues/58#issuecomment-57000720

squirrelo commented 10 years ago

Sounds good to me. And actually the %(blah) style of variables in the example works regardless of how we do the translations, so we can do single translations for multiple projects and just sub in the project name programatically.

ElDeveloper commented 10 years ago

@squirrelo would you mind opening a pull request with this setup or if you need help we can work on it on monday.

I'm @ A222 if you would like to work today.

squirrelo commented 10 years ago

I'm working on the FAQ page right now so I can open a pull request on the unfinished version I have just for an example.

ElDeveloper commented 10 years ago

That would be awesome, thanks!

On (Sep-26-14|15:37), Joshua Shorenstein wrote:

I'm working on the FAQ page right now so I can open a pull request on the unfinished version I have just for an example.


Reply to this email directly or view it on GitHub: https://github.com/biocore/american-gut-web/issues/58#issuecomment-57029460

squirrelo commented 10 years ago

BTW in doing this I found that since we are mimicking the tornado.locale thing with our own dicts, we really don't need it. It was a good idea, but we are too text-heavy and text-changey to have it work properly.

wasade commented 10 years ago

Just realized following the work on #69 that we can just include text_locale in the templates which mean that we do not need to update the handlers. Will have an example up in a second

wasade commented 10 years ago

See #70, this approach lets us not even have to modify the handlers and all the work can be done in the template. Basically, you just add:

{% from amgut import text_locale %}
{% set tl = text_locale['forgot_password.html'] %}

Into {% block content %}. It is sensitive to where these lines are placed, and the best place seemed like block content. I tried putting into block content in no_auth_sitebase.html and it wouldn't work, and placing at the top of the file (prior to block head) didn't work either for some reason. WIll update the sitebase PR since that was encompasing...

wasade commented 10 years ago

nm, not appropriate for the sitebase PR. Will issue a subsequent PR to get help_request fixed and will address the other PRs in the process. @squirrelo, just a heads up since I know you're working on FAQ

squirrelo commented 10 years ago

Thanks, added into the FAQ page.

wasade commented 10 years ago

@ElDeveloper and @squirrelo, either of you working on addendum.html right now? If not, I'm on it

wasade commented 10 years ago

...rather, I am working on it now, just fyi

ElDeveloper commented 10 years ago

Sounds good.

Yoshiki Vázquez-Baeza

On Sep 29, 2014, at 7:33 PM, Daniel McDonald notifications@github.com wrote:

...rather, I am working on it now, just fyi

— Reply to this email directly or view it on GitHub.

squirrelo commented 10 years ago

Was just about to, but not now!

squirrelo commented 10 years ago

Is anyone working on animal_survey right now?

wasade commented 10 years ago

The surveys have some oddities as I understand, @adamrp, can you comment please?

On Mon, Sep 29, 2014 at 7:38 PM, Joshua Shorenstein < notifications@github.com> wrote:

Is anyone working on animal_survey right now?

— Reply to this email directly or view it on GitHub https://github.com/biocore/american-gut-web/issues/58#issuecomment-57256048 .

squirrelo commented 10 years ago

righto, I'll work on change_pass_verify and participant_overview then.

squirrelo commented 10 years ago

Also, the international page is going to be interesting to do as it already has built in translations. Anyone want to take a crack at it or do we need to discuss what to do with that page?

wasade commented 10 years ago

we should discuss at AG tomorrow about the international page

antgonza commented 10 years ago

I just started working with the animal_survey.html and found that the options of the animal type are coded in english. Basically that the option Dog has a value of Dog vs. option Dog - value 1. I can leave it like this or change it but not sure about the consequences of either option. What do you think will be the best?

wasade commented 10 years ago

@adamrp has the most up to date knowledge of the surveys On Sep 30, 2014 8:29 AM, "Antonio Gonzalez" notifications@github.com wrote:

I just started working with the animal_survey.html and found that the options of the animal type are coded in english. Basically that the option Dog has a value of Dog vs. option Dog - value 1. I can leave it like this or change it but not sure about the consequences of either option. What do you think will be the best?

— Reply to this email directly or view it on GitHub https://github.com/biocore/american-gut-web/issues/58#issuecomment-57322296 .

wasade commented 10 years ago

Okay, looks like just international.html and the surveys. Lets discuss these at the AG meeting. I'm starting on #89, which I think is the last blocker for sending content to the British Gut

wasade commented 10 years ago

I believe only animal_survey is left to do, pending doing this in non-English English first

wasade commented 10 years ago

animal_survey questions and responses are in #141. These are not going into BG locale dict as they will be sourced from the db