gma / nesta

File Based CMS and Static Site Generator
http://nestacms.com
MIT License
902 stars 121 forks source link

translation system for nesta #47

Closed joaotavora closed 12 years ago

joaotavora commented 13 years ago

Hello!

I've been working recently on a translation system for nesta. I've extracted it from my own projects since it's a very common requirement for blogs and sites, especially in non english-speaking countries like Portugal. A multi-language article/page looks like:

Date: 31 Dec 2010
Language: en

# A very short primer on how to use nesta

language: pt
Summary: Um sumário muito curto sobre como começar a usar o nesta.

# Comece Por Aqui

So here's a pull request (my first ever on github, or git for that matter). I'm mostly asking for your feedback, do tell me if you prefer patches.

The main challenge here is to be completely backwards compatible with legacy nesta content (and themes). Obviously it's also meant to be flexible so that not only you have different content for different languages, you can also have custom metadata for different languages etc.

Some problems and my solutions:

To try it out, see git@github.com:capitaomorte/nesta-demo-content.git, which clones the demo content with a limited number of (very poor) pt translations.

gma commented 13 years ago

That sounds very interesting — thanks for sending it over. I'm abroad at the moment (without a computer) so I've not been able to check it out, but I'll take a proper look when I get home.

Cheers, Graham

On 22 Apr 2011, at 12:40, capitaomorte reply@reply.github.com wrote:

Hello!

I've been working recently on a translation system for nesta. I've extracted it from my own projects since it's a very common requirement for blogs and sites, especially in non english-speaking countries like Portugal. A multi-language article/page looks like:

Date: 31 Dec 2010
Language: en

# A very short primer on how to use nesta

language: pt
Summary: Um sumário muito curto sobre como começar a usar o nesta.

# Comece Por Aqui

So here's a pull request (my first ever on github, or git for that matter). I'm mostly asking for your feedback, do tell me if you prefer patches.

The main challenge here is to be completely backwards compatible with legacy nesta content (and themes). Obviously it's also meant to be flexible so that not only you have different content for different languages, you can also have custom metadata for different languages etc.

Some problems and my solutions:

  • how to discover when a language ends and another starts in the article file? I look for the "language:" key at the beginning of a line. If some article author discovers he/she must use that in some other context, a "language_key:" key can be used. Also, in the example above, the "date:" metadata goes for both languages. since it's above the first language key in the whole file.
  • how to choose the language to be requested? I used params[:locale] or params[:lang] here since it's the easiest, but it could be modified to use some other mechanism like prefixing or suffixing the url, or using subdomains.
  • how to keep being able to just call, for example, @page.heading (so as not to break peoples local views of themes) and have it do the right thing for the right language? This was the hardest, and I resorted to an ungly trick, which is to keep a hash of Nesta::App instances indexed by Thread.current. Have a look at the app.rb diff to see what I mean. Any other suggestions?
  • how to react when a page is not available in the requested language. See handling of LocaleNotAvailable in app.rb?
  • how to translate view code strings (as opposed to content, like that bit of text "Articles by category" in the sidebar)? I used the sinatra-r18n gem for this and look for a "translations" dir inside the users "content" folder. Which isn't ideal since "./translations" should come under "./views" the default views or theme. Couldn't find an ansy way to do that.
  • how to rspec this? This isn't really a problem I was just lazy.
  • also added a prefer_pages_first_locale option to the config. Set this if you want a locale-less url to take to a page written in the pages default language, as opposed to the site's default language.
  • still missing a way to let the user specify a preferred order of languages.

To try it out, see git@github.com:capitaomorte/nesta-demo-content.git, which clones the demo content with a limited number of (very poor) pt translations.

Reply to this email directly or view it on GitHub: https://github.com/gma/nesta/pull/47

gma commented 13 years ago

I'm back, and I've just had a look through it. I'll take each of your points in turn, to try and make sure that I address everything…

How to discover when a language ends and another starts in the article file? I look for the "language:" key at the beginning of a line.

This is a tricky one. Did you consider putting each language in a separate file? I'd be tempted to create a folder for each translated article, to contain the translations, with each of the translated files named after the locale. Perhaps something like this:

content/pages/name-of-article.mdown
content/pages/name-of-article/translations/pt.mdown
content/pages/name-of-article/translations/es.mdown

A key advantage that I see here is that there are multiple files for different languages, which (if this were to be used on a large scale) would surely help the translators. It also simplifies the code; I'm not keen on the complexity introduced by trying to recognise the start of a new translation mid-way through a file.

The main .mdown file could also be used to define shared metadata, while the translations could override whichever metadata keys that the translator felt was appropriate.

Would the URL ever change when viewing a translation? I feel that it should, as search engines will expect the content that they index at a given location to match that which the user will see. So the URL really ought to either include the locale, or have a different permalink (i.e. a translated permalink). Perhaps my proposed scheme would need modifying slightly in light of this.

how to choose the language to be requested?

I'm favouring the idea of (as a minimum) including the locale in the URL. Perhaps /name-of-article/es, or a site wide switch such as /es/name-of-article.

How to keep being able to just call, for example, @page.heading and have it do the right thing

As you say, copy in themes would need to be translatable, and the translations would need to be stored in the theme. I don't see this being an issue.

Rather than merge this code into Nesta's core, I think it belongs in a plugin (plugins aren't really ready for prime time yet, but we can discuss them as if they are as they'll be pretty simple really). Each plugin will belong in it's own repository, and will be packaged up as a separate gem (e.g. nesta-plugin-translations in this case). A user will install a plugin by adding it to their Gemfile. Nesta will load the plugin's initialisation code at startup.

A translation plugin could provide themes with a way of translating their copy. A translated theme would need to be modified to work with the translation plugin, but that wouldn't be too difficult to arrange. The theme could also work well when the translation plugin isn't installed.

What's the point of introducing Thread.current here? If you need something to identify the current request, could you just use the Rack request object?

how to react when a page is not available in the requested language?

That sounds like it should be a 404 to me.

also added a prefer_pages_first_locale option to the config

What's the use case? I'd have thought the default for any page could be defined by the content, without a need to configure it.

still missing a way to let the user specify a preferred order of languages

Again, I'm wondering what the motivation is here. If you mean the order within a list of available languages, alphabetical sounds good to me. Anybody who cares enough to change it could do so by tweaking code in their application's app.rb file.

Thanks for sending this over, and I'm sorry it took me so long to get back to you. Let me know how it goes, or if you want help/tips on how to approach a plugin.

Cheers, Graham

joaotavora commented 13 years ago

Hi!

1) I did consider putting each translation in a separate file, in fact I have another (more heavily) modified version of nesta where whole articles are represented as folders and not as files. (This btw also lets me drop assets in a folder and have nesta recognize these as the article's assets). But I dropped it since in the end it was too complicated for me, and my clients also thought it too bizarre. We just want to find a file that represents an article and change it all there, that's the most common use case. Anything beyond that and its going into heavy-duty CMS territory.

Folders are meant to represent to represent parenting and are used in the organization of urls. A "special" name-of-article folder would go against the grain there, it would be a mixed scheme between articles-as-files and articles-as-folders. I tried the second and, it semms breaks the simplicity... A mixed scheme would be even uglier (imho...).

Also it would be a little less flexible since users wouldn't be able to create some pages only in the non-default language.

Lastly, and this is obviously subjective, I think for the amount functionality introduced and the backward compatibility, some 60 new lines in models.rb is not much. Though, it could be made more readable I guess...

2) Including the locale in the url can be done. The only thing to be aware of is the following. Trying to load somearticle/es we might actually find somearticle/es.mdown as a page in itself and should display that with the default language. My technique would be: "If we can't find it, assume the last bit is the language and try to load that. "

3) Well I might be seeing the whole thing upside down, but it's not easy to get the current Rack request object from the models.rb file. Maybe it is easy and I've just missed it? I've done it with a class variable, Thread.current naturally being a way to get the relevant app object in a thread safe way.

4) I'm leaning to 404 here too. It would be nice, though, to customize that 404 with an error message like "sorry! page is only available in pt, es."

5) prefer_pages_first_locale is for when you look up a page and include no language specifier in the url. What to do if that page does not exist in the site's default language? Show a 404 or show the page in its first available locale.

I have no problem with making this a plugin, but I'm kind of intransigent with keeping the nesta way of the whole article in the same file. If we can think of some kind of stable plugin API that allows for that, great!

Bye, João

gma commented 13 years ago

On 9 May 2011, at 11:43, capitaomorte wrote:

A "special" name-of-article folder [snip] would be a mixed scheme between articles-as-files and articles-as-folders. I tried the second and, it semms breaks the simplicity...

I see where you're coming from, certainly. I resisted foo/index.mdown for the same reason, but the need to implement the home page (and the clearer organisation on disk, with all article in a "section" in the same folder) persuaded me to relax the design restriction.

Also it would be a little less flexible since users wouldn't be able to create some pages only in the non-default language.

I don't think any of these approaches need limit the options there. The URL should determine the language, and the code should find the right translation from there. Wouldn't the language that is used for an article when a language isn't specified just be specified by the content author? It'd be whatever language they enter into the file.

Lastly, and this is obviously subjective, I think for the amount functionality introduced and the backward compatibility, some 60 new lines in models.rb is not much. Though, it could be made more readable I guess...

I won't be merging translation functionality into Nesta's core. I do think it's a great idea, but it should be optional extra for those that need it (and therefore a plugin).

This is okay, as plugins can do anything that you can do in Nesta itself.

2) Including the locale in the url can be done. The only thing to be aware of is the following. Trying to load somearticle/es we might actually find somearticle/es.mdown as a page in itself and should display that with the default language.

That wouldn't be too difficult.

Maybe it is easy and I've just missed it? I've done it with a class variable, Thread.current naturally being a way to get the relevant app object in a thread safe way.

I'm not sure; I haven't looked at how to get access to a request in there. Thread.current doesn't feel quite right, but I've not poked about.

4) I'm leaning to 404 here too. It would be nice, though, to customize that 404 with an error message like "sorry! page is only available in pt, es."

The plugin could redefine the error handler to do that.

5) prefer_pages_first_locale is for when you look up a page and include no language specifier in the url. What to do if that page does not exist in the site's default language? Show a 404 or show the page in its first available locale.

I'd show it in the default language, and make that the defined behaviour of the plugin.

I have no problem with making this a plugin, but I'm kind of intransigent with keeping the nesta way of the whole article in the same file. If we can think of some kind of stable plugin API that allows for that, great!

We don't need to invent a public API yet; just use Ruby's monkey patching to override the behaviour of Nesta's builtin classes. The important access points (around which an API should be defined) will only become apparent in practice.

Now I know what you're thinking ("this man's nuts!") but it's the only pragmatic solution I can think of. It'll only work if all plugins have decent test coverage, so that they can all be tested automatically against new versions of the Nesta gem. But they'll be written test first anyway, right? ;-)

And as a plugin, you can implement it however you see fit.

lilith commented 13 years ago

I'm definitely in need of translation functionality for http://imageresizing.net/.

What's the status on this?

gma commented 13 years ago

I suspect it's blocked on decent plugin support (i.e. plugins distributed as gems). I've been wanting to sort the test suite out first (and switch to test-unit) but perhaps I should just bite the bullet and deal with plugins first.

If I thought that people would start writing plugins straight away I'd be a lot quicker to get it in. Currently it feels as though I'll have to write some plugins too before anybody would get any use out of it, but maybe you guys are waiting to build/release a plugin to do translations?

joaotavora commented 13 years ago

I admit I haven't tried to write it as a plugin. The system is usable, I used for katrinkaasa.com.

To implement it as a plugin, it seems that a lot of unstable monkey patching would be needed. By unstable I mean changes to the nesta core would break plugin support. @gma you're totally right about the testing, and I haven't had a lot of time for geeking lately, but I think the nesta core could be expanded with this, since it's not a lot of code and it's written to be completely backward compatible with existing apps and themes.

@nathanealjones, to summarize: it's fragile (no tests) but working OK. You can get git@github.com:capitaomorte/nesta.git, try it out and raise any issues there. I'll try to provide support and bug fixes. And write tests.

lilith commented 13 years ago

@gma, As I have 2 (soon 3) Nesta sites, I could definitely use the plugins ability to share my nesta mods. It's not a high-priority, but it would be nice to have an easy way to share may changes with others without branching the core code.

@capitaomorte, What's your opinion on having separate files using an extension (like .mdown, .es.mdown, etc)?

joaotavora commented 13 years ago

@nathanaeljones, see my previous reply to the same suggestion. I had a system like that in place but found it was too confusing, for me and also to explain to my clients, it broke the original nesta simplicity of file-per-page, and a lot more code in lib/models.rb. In your opinion what's the downside of keeping it all in the same .mdown? Maybe not being able to tell the available translations from the filesystem tree?

On Fri, Aug 26, 2011 at 4:17 PM, nathanaeljones reply@reply.github.com wrote:

@gma,  As I have 2 (soon 3) Nesta sites, I could definitely use the plugins ability to share my nesta mods. It's not a high-priority, but it would be nice to have an easy way to share may changes with others without branching the core code.

@capitaomorte, What's your opinion on having separate files using an extension (like .mdown, .es.mdown, etc)?

Reply to this email directly or view it on GitHub: https://github.com/gma/nesta/pull/47#issuecomment-1913480

João Távora

lilith commented 13 years ago

I'm not really opposed to having everything in a single file, but it is very important that each version have a separate URL. I'm a fan of the subfolder approach here... See http://www.google.com/support/webmasters/bin/answer.py?answer=182192

I'd use browser language detection to prompt the user (via javascript) if they want a different language version, then set that version via a cookie. The server from then on would redirect any requests for the default language URL (the one without the subfolder) to the version of the file with the selected language (if present)..

joaotavora commented 13 years ago

by subfolder approach I guess you mean:

www.yoursite.com/es/article/subarticle or www.youtiste.com/article/subarticle/es

instead of

www.yoursite.com/article/subarticle?lang=es

right? Well that's OK, it's independent of the single/multiple file thing, it's not implemented in my fork but shouldn't be hard to do? Even better if your could help :-D One just has to be careful if there is actually an article "es.mdown" there somewhere. In that case a decision has to be made wheter to serve "es.mdown" or serve "subarticle.mdown" in spanish.

I'd keep any special prompting scheme out of the extension, and even the cookie thing could possibly be done with a user-written before filer or something.

Anyway, maybe you should try out my fork and this discussion moved there. I can't guarantee super fast support though, it would be good if you could provide some code.

gma commented 13 years ago

On 26 Aug 2011, at 03:58, capitaomorte wrote:

To implement it as a plugin, it seems that a lot of unstable monkey patching would be needed. By unstable I mean changes to the nesta core would break plugin support.

I tend towards the view that monkey patching is - during the early days of a project - the only sensible approach to extending an app like Nesta, written in a language like Ruby. An appropriate public API will only become apparent as people extend Nesta, and I wouldn't expect to be able to predict what to expose and what to hide beforehand.

Only once plugins have been written that highlight the important entry points, could I attempt to extract (and freeze) a public API. I'd also need to feel as though the monkey patching approach was failing in practice before trying to define such an API.

I'm not generally a fan of monkey patching, but in this case I think it's the way forward.

I'm expecting to be able to mitigate the fragility.

1) Every plugin should have a small test suite that's designed to make sure that it works in Nesta.

2) I would install all known plugs on my own computer and run their tests when modifying Nesta's internal methods, and prior to making a new release.

I'd be alerted to any breakages, could submit fixes to a plugin's maintainer, or (preferably) adjust the way I modify Nesta to ensure that plugins continue to work. Meanwhile I'd be getting good insight into what the plugins would be dependent on.

All it'd take is a bit of scripting to automate updating and running all plugin tests, and for people to announce their plugins on the mailing list.

I think the nesta core could be expanded with this, since it's not a lot of code and it's written to be completely backward compatible with existing apps and themes.

I'm not too concerned with volume of code (though a patch that adds 203 lines, changes 11 and deletes 33 is significant in a project of this size).

The thing I want to maintain is simple and obvious patterns. Understanding the code base should be a quick exercise. That means obvious metaphors, such as one file on disk representing one page on the site. For some of us Nesta is a quick way to put up a web site. For others it's a nice simple starting point on which to build something bespoke, like this:

http://blog.peepcode.com/tutorials/2010/about-this-blog

There's no reason why plugins should stick to these same metaphors; I'm all for plugins making tricky things easy, and throwing caution to the wind.

My preference for putting translations in plugins doesn't mean that I'm in any way considering translations to be unimportant; I think it's a wonderful feature.

It's more that I consider accessibility/legibility of Nesta's code to be as important as well written docs, and new features will need to be either very simple, or a no brainer for the majority of Nesta's users in order to avoid the "let's put that in a plugin" approach.

joaotavora commented 13 years ago

I understand your argument about legibility and modularity and hackers that just want it as a starting point. I'm not saying I want to get it in the core just so my precious beautiful code makes it to the core :-D

It's just I reasonably don't think and API will ever pop up in the near future (nesta as been around for a while and it hasn't, if we're honest). Nor is nesta such a big super thing right now and it has few developers. I hope it becomes one, don't get me wrong. In my opinion and in my experience, improving modularity happens more naturally when you separate existing functionality that is lumped together.

But in every case, there's no free lunch. Adding to core, monkey patching and API's all make the code harder to read and modify. I believe that, in the point nesta is now, adding to the core is the most efficient and cleanest way to add translations.

Consider: why are themes in the core, why aren't they a plugin too? I think because they are more nicely written than my translations patch, but then translations were a bit harder to do cleanly.

But by hard I don't mean impossible. So, time allowing, I'll write some tests ;-). Then try to create a new lib/translations.rb using some superclever simple design patterns. The changes to lib/models.rb wound stay at a minimum and the normal flow of reading code would be less disrupted.

I could also look for possible API points to make translations work.

From what I remember in the code though, to implement it as monkeypatching I would have to completely redefine some methods, and repeat your code very often. I'm not going to do it, it's just a visceral thing, i'm sorry :-D

gma commented 12 years ago

Closing due to lack of activity…

joaotavora commented 12 years ago

Fair enough :-), I haven't used nesta for a while hence the inactivity... we can always reopen when I find the time. Thanks!

On Wed, Feb 29, 2012 at 12:45 PM, Graham Ashton reply@reply.github.com wrote:

Closing due to lack of activity…


Reply to this email directly or view it on GitHub: https://github.com/gma/nesta/pull/47#issuecomment-4236753

João Távora

lilith commented 12 years ago

I'm a fan of looking for a generic way to allow multiple content sections per file - not just for translations, either. I wrote a minimalist CMS in 2007, and after years of refactoring, I realized that the ideal data structure for a file-based CMS is two sets of key/value pairs. 1 set for metadata, 1 set for content. Merging the sets is even better, but hard to do syntactically. Some website designs require rich text summaries or excerpts, for example. Translations are another great use case.