Athou / commafeed

Google Reader inspired self-hosted personal RSS reader.
https://www.commafeed.com
Apache License 2.0
2.74k stars 369 forks source link

Support for RTL languages #205

Closed Huji closed 11 years ago

Huji commented 11 years ago

Please add support for content in right-to-left languages (such as Arabic, Persian, Hebrew, etc.) and by support I mean the content of those entries should be wrapped in a DIV which has a style of "direction:rtl"

Athou commented 11 years ago

I'll see what I can do

Athou commented 11 years ago

Is this applicable on an account basis or per feed?

Huji commented 11 years ago

It is per post. Some feeds post content in more than one language. You can take a look at this example: http://tinyurl.com/rtlfeed

Athou commented 11 years ago

How am I going to detect the post needs rtl?

Huji commented 11 years ago

Based on the language tag or attribute:

1) If a feed has <language>XX</language> tag in it, and XX is a right-to-left language (such as ar, fa, he, etc.) then the default for that feed should be right to left. Read more here: http://blogs.msdn.com/b/rssteam/archive/2007/05/17/reading-feeds-in-right-to-left-order.aspx

2) Later on, you look at each entry. If they have the 'xml:lang="XX"' attribute for the "title" and/or "summary" tag, the respective section of the feed (that is, that title, or that summary) should be shown in RTL. From http://tinyurl.com/rtlfeed look at this excerpt:

... <entry> <id>...</id> ... <rights>...</rights> <title xml:lang="fa">...</title> <summary xml:lang="fa">...</summary> ... </entry> ...

Both the title and the summary of that entry should be shown in RTL.

3) Finally, if a feed has defaulted to RTL (because of the "language" tag), but an entry of it has a xml:lang attribute which lists an LTR language like English, that entry should be shown in LTR. In other words, rule 2 takes precedence over rule 1.

Huji commented 11 years ago

I also examined a few other RTL blogs and it seems that on some occasions, they are misconfigured as if they are in English (language code "en" is returned in the feed), but when I browse them in Google Reader, it shows them in RTL. It seems like on top of all the above, we need a 4th rule that predicts the language based on the content. However, let's start with the above three.

ebraminio commented 11 years ago

Indefinitely needed, current state is not usable for Persian/Arabic/Hebrew and other RTL languages :( https://code.google.com/p/google-web-toolkit/source/browse/trunk/user/src/com/google/gwt/i18n/shared/BidiUtils.java can be used for detecting rtl posts per content. I think a statistical would be nice, when content has more RTL characters ("\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC") it will be marked as RTL.

Athou commented 11 years ago

I like this method, I'll draft a first version using this.

ebraminio commented 11 years ago

Also there is another cleaner solution based on NLP https://code.google.com/p/language-detection/ This library can detect language per contents then rtl style can be used if detected languages is rtl. Also this is list of RTL languages ar|arc|arz|bcc|ckb|bqi|dv|fa|fa-af|glk|ha|he|kk-arab|kk-cn|ks|ku-arab|mzn|pnb|prd|ps|sd|ug|ur|ydd|yi

Athou commented 11 years ago

I referenced the wrong issue in the commit message : 9568ccfeacc411b8565bee8bc26b2296825fdb4a

ebraminio commented 11 years ago

WOW!

Athou commented 11 years ago

I'll deploy it in a few minutes, let me know what you think.

ebraminio commented 11 years ago

2013-05-31 12_08_56-google reader 17 2013-05-31 12_08_26-1839 - commafeed Very well! But some improvement is needed. http://www.google.com/reader/view/#stream/feed%2Fhttp%3A%2F%2Ffedorafans.com%2Ffeed%2F

ebraminio commented 11 years ago

https://code.google.com/p/closure-stylesheets/ can flip styles for your CSSs so only thing I think is needed is generating an RTL version of your CSSs and prefixing .rtl on them and including it on rtl pages. (This is also useful for commafeed fully localization for RTL languages)

ebraminio commented 11 years ago

Also there is another solution, I can manually make needed .rtl CSSs. Is this a better solution?

Athou commented 11 years ago

Probably, we'll have better control of what happens.

Huji commented 11 years ago

9568ccf has fixed most of the problem, but not all of it.

Whenever the entry only has a title (that is the summary is blank), the directionality of the title should be determined based on the language detected for it using BidiUtils. Right now, it always flows LTR.

ebraminio commented 11 years ago

I did this for improving styles https://github.com/Athou/commafeed/pull/240

Huji commented 11 years ago

I have. #240 deals with the stylesheets. My question is about when the "rtl" class should be applied. Currently, it is only applied if the entry has a summary, and the language of that summary is determined to be RTL. What is missing is if the entry doesn't have a summary (e.g. the content is just an image, a video, or even blank), but has a title that is, say, in Arabic. At this time, that title is shown LTR, because the appropriate classes are not added to it.

ebraminio commented 11 years ago

I agree with huji. Also in another situation, if a post contains a lot of sample codes (they are usually wrapped with <div dir="LTR" />, e.g. http://feeds2.feedburner.com/PersianBloggers/Programming ) and less Persian comments, it must be showed in RTL otherwise its Persian comments would be unreadable. This shows statistical way for determining direction is not good enough or in the other word it must be used as a fallback detector. So a solution just like what huji said at https://github.com/Athou/commafeed/issues/205#issuecomment-18613858 is needed, though 9568ccf will work at most cases and good enough for now IMO.

ebraminio commented 11 years ago

I think with a3cc4ee269 huji's concerns is resolved so please ignore my previous comment and close this bug if you want. Other improvements can be done later (when we see a real case for it). Thanks.