backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 40 forks source link

Update Backdrop with better greek transliteration #4947

Open cptX opened 3 years ago

cptX commented 3 years ago

Description of the bug Hi, I'm greek and I can tell you with certainty that the transliteration used in backdrop for greek letters in url is not what greeks expect. I absolutely agree with the changes that took place in Drupal 7,8 and 9 core regarding the transliteration. The following links show the changes that need to take place and I agree with them: https://www.drupal.org/project/transliteration/issues/1215872 https://www.drupal.org/project/drupal/issues/2926187

I also have installed Drupal 9 recently and the above transliterations are already in core! We should update Backdrop to include them too! Actually I don't understand why Backdrop didn't include these changes already.

From the first link the patch works also for backdrop for the file /core/includes/transliteration/x03.php

indigoxela commented 3 years ago

Actually I don't understand why Backdrop didn't include these changes already.

We waited for you to provide a pull request. :laughing:

No, seriously, are you familiar with pull requests on Github? Do you need any guidance or help? If so, feel free to ask any question.

klonos commented 3 years ago

Hi, I'm greek...

Κι εγώ πατρίδα!! 😄

Actually I don't understand why Backdrop didn't include these changes already.

Because the only other Greek person contributing to Backdrop has been focusing on other things 😅 ...now that there's 2 of us here, we can get things like this moving along!

As @indigoxela said, are you comfortable with creating pull requests on GitHub @cptX? I can help with that if you have time ...best way to contact me is via our Zulip chat: https://backdrop.zulipchat.com

cptX commented 3 years ago

When I said "Actually I don't understand why Backdrop didn't include these changes already." I meant because these changes are already existing in D7. Probably Backdrop forked before these changes went to the core. @klonos unfortunately I'm not very familiar with pull requests. I'll contact you through the chat

cptX commented 3 years ago

Some words in greek have the diphthong "ου" like in word "κουλουρι". The correct transliteration for us greeks should be "koulouri". With the above suggested patch the transliteration would be "koyloyri" which optically in not so nice. Is it possible to use a diphthong (meaning two letters) as the source of the transliteration or the transliteration should always have a source of a single letter?

klonos commented 3 years ago

I haven't worked in transliteration files in a while, but I think that it is possible to replace a diphthong with a single letter (as is the other way 'round). So "ου" can become either "ou" or "u" or "oo". I'll have a look at those d.org issues soon as I get a chance, and see if we can use the transliteration file that seems to be the most popular/accepted by Greek-speaking people.

cptX commented 3 years ago

Hi what is the status about this? I think my patch is checked automatically. What is the next steps to merge it to the main branch?

stpaultim commented 3 years ago

@cptX - Sometimes, issues like this get forgotten about until someone, like yourself revives them. If you create a PR and do not see any action on it after weeks or months, it is ok to do any of the following things:

1) Post another comment like you did above asking for an update. Hopefully, this will draw attention to your issue and help move it forward. 2) Post a note in Zulip, like you did. Asking someone to review and test your PR. 3) Ideally, if you know someone like @klonos that is specifically interested or able to review or test, to contact them directly and ask for help.

It is ok and even recommended that you advocate for your issue/PR, by nicely asking for help in any of the above channels. If you don't remind us, this might sit for a long time before someone else notices it.

klonos commented 3 years ago

So here's an update on this and what remains to be done:

  1. The PR will need to be rebased (@cptX please let me know if you need help with that)
  2. Someone will need to take the time to document the changes. @cptX privately sent me some messages in Zulip with testing he's done and what results he's gotten from an SEO perspective etc. This will need to be done by updating the issue summary here, so that it's easy for others to find the suggested changes and the reasoning behind them.
  3. Someone will need to take the time to review/test/confirm (that one is on me I guess)
  4. I am reluctant to RTBC a change in the transliteration system based on the testing/opinions of only 2 people (@cptX and myself), since this will affect many others. Perhaps @dyrer who's a Greek-speaking person could chime in to say what he thinks about the changes, although he mostly works with WordPress and not as active in the Backdrop community. In order for others to chime in though, we need to complete point 1 above (which I understand needs time/effort, but needs to be done).
indigoxela commented 3 years ago

I am reluctant to RTBC a change in the transliteration system based on the testing/opinions of only 2 people

Maybe it's not that difficult, because - if I get it right - the change is based on an already merged Drupal patch. If so, we can consider it as tested by a lot more people. If the PR goes beyond that or does something different ... then we'd need one or two additional testers.

cptX commented 3 years ago

Drupal 7 already had implemented the correct greek transliteration and is unfortunate that backdrop got the drupal 7 instance just before this enhancement. My modifications took account of the drupal 7 changes but I did a couple more small enhancements too. As @indigoxela said I don't consider these updates huge and the other smaller updates are about a couple of special chars which in my opinion were not perfectly transliterated. All of them regard the greek alphabet only. As we are the only 2 active greek people in the backdrop community I think we should move forward now and not wait for other greeks to join as this could take ages... Also transliteration is a bit subjective and not all greeks do it the same way. I did statistical analysis based on google results so I think we should take a decision between us. As me and Klonos have already agreed on these changes in our private messages I don't see any more reason to postpone it. It affects the core so it is the only reason for a greek person at the moment to patch the backdrop core if he wants to make a greek site

cptX commented 3 years ago
1. The PR will need to be rebased (@cptX please let me know if you need help with that)

@klonos I don't have a clue what this means.

Someone will need to take the time to document the changes. @cptX privately sent me some messages in Zulip with testing he's done and what results he's gotten from an SEO perspective etc. This will need to be done by updating the issue summary here, so that it's easy for others to find the suggested changes and the reasoning behind them.

@klonos I will probably do it. I will try to find some time, and I will notify you after documenting the changes. Are you willing to merge it in main if we both accept my suggestions and agree on the changes?

klonos commented 2 years ago

I don't have a clue what this means.

In short, it means that your branch needs to be updated to include recent changes that have been merged into Backdrop core between the time you filed the PR and now.

Because I see that your PR is made from a branch called patch-2, I am assuming that you have created the PR from the GitHub web interface, so you have no local copy on your computer. In that case, here's how it could be done on your local (that's how I do it):

Grab a copy of the main backdrop repo:

git clone https://github.com/backdrop/backdrop docroot
cd docroot

Add your fork as a remote:

git remote add cptX https://github.com/cptX/backdrop

Push updates to the 1.x branch of your fork (this makes sure that the 1.x branch of your fork is updated with recent changes from the original Backdrop repository):

git push -u cptX 1.x:1.x --force

Fetch changes from your branch (patch-2) from the PR you made (3650 is the PR number in your case) into your local:

git fetch origin pull/3650/head:patch-2

Checkout the branch on your local (git stash is used in order to retain any local changes - it can be omitted, but adding it so that others seeing this in the future don't overwrite their local changes):

git checkout patch-2
git stash

Merge the 1.x branch from Backdrop into your local PR branch (and re-apply any previously-stashed changes over it):

git merge origin/1.x -m "Merge 'origin/1.x' into cptX/patch-2"
git stash apply

Push the changes of your locally-updated branch to your repo:

git push -u cptX patch-2

Please try the above and let me know if there's any issues.

klonos commented 2 years ago

Are you willing to merge it in main if we both accept my suggestions and agree on the changes?

I don't have permissions to merge in the Backdrop core code directly, but I can mark this issue (the PR that's linked to it) as RTBC (Reviewed and Tested By the Community). A core committer will then do a final review and merge if they agree with the changes.

dyrer commented 2 years ago

Hello, I dont know if this code part might help,

/**
     * All the greek letters and their latin counterparts
     *
     * @since    1.0.0
     * @since    3.4.0 Added polytonic characters
     * @access   protected
     * @var      array $expressions All the greek letters and their latin counterparts
     */
    protected static $expressions = array(
        '/[ἀἁἈἉᾶἄἅἌἍἆἇἎἏἂἃἊἋᾳᾼᾴᾲᾀᾈᾁᾉᾷᾆᾎᾇᾏᾂᾊᾃᾋὰαάΑΆ]/u'  => 'a',
        '/[βΒ]/u'                                       => 'v',
        '/[γΓ]/u'                                       => 'g',
        '/[δΔ]/u'                                       => 'd',
        '/[ἐἑἘἙἔἕἜἝἒἓἚἛὲεέΕΈ]/u'                        => 'e',
        '/[ζΖ]/u'                                       => 'z',
        '/[ἠἡἨἩἤἥἬἭῆἦἧἮἯἢἣἪἫῃῌῄῂᾐᾑᾘᾙᾖᾗᾞᾟᾒᾒᾚᾛὴηήΗΉ]/u'   => 'i',
        '/[θΘ]/u'                                       => 'th',
        '/[ἰἱἸἹἴἵἼἽῖἶἷἾἿἲἳἺἻῒῗὶιίϊΐΙΊΪ]/u'              => 'i',
        '/[κΚ]/u'                                       => 'k',
        '/[λΛ]/u'                                       => 'l',
        '/[μΜ]/u'                                       => 'm',
        '/[νΝ]/u'                                       => 'n',
        '/[ξΞ]/u'                                       => 'x',
        '/[ὀὁὈὉὄὅὌὍὂὃὊὋὸοόΟΌ]/u'                        => 'o',
        '/[πΠ]/u'                                       => 'p',
        '/[ρΡ]/u'                                       => 'r',
        '/[σςΣ]/u'                                      => 's',
        '/[τΤ]/u'                                       => 't',
        '/[ὐὑὙὔὕὝῦὖὗὒὓὛὺῒῧυύϋΰΥΎΫ]/u'                   => 'y',
        '/[φΦ]/iu'                                      => 'f',
        '/[χΧ]/u'                                       => 'ch',
        '/[ψΨ]/u'                                       => 'ps',
        '/[ὠὡὨὩὤὥὬὭῶὦὧὮὯὢὣὪὫῳῼᾠᾡᾨᾩᾤᾥᾬᾭᾦᾧᾮᾯᾢᾣᾪᾫὼωώ]/iu'  => 'o',
    );

    /**
     * All the greek diphthongs and their latin counterparts
     *
     * @since    1.0.0
     * @since    3.4.0 Added polytonic characters
     * @access   protected
     * @var      array $diphthongs All the greek diphthongs and their latin counterparts
     */
    protected static $diphthongs = array(
        '/[αΑ][ἰἱἸἹἴἵἼἽῖἶἷἾἿἲἳἺἻὶιίΙΊ]/u'                        => 'ai',
        '/[οΟ][ἰἱἸἹἴἵἼἽῖἶἷἾἿἲἳἺἻὶιίΙΊ]/u'                        => 'oi',
        '/[Εε][ἰἱἸἹἴἵἼἽῖἶἷἾἿἲἳἺἻὶιίΙΊ]/u'                        => 'ei',
        '/[αΑ][ὐὑὙὔὕὝῦὖὗὒὓὛὺυύΥΎ]([θΘκΚξΞπΠσςΣτTφΡχΧψΨ]|\s|$)/u' => 'af$1',
        '/[αΑ][ὐὑὙὔὕὝῦὖὗὒὓὛὺυύΥΎ]/u'                             => 'av',
        '/[εΕ][ὐὑὙὔὕὝῦὖὗὒὓὛὺυύΥΎ]([θΘκΚξΞπΠσςΣτTφΡχΧψΨ]|\s|$)/u' => 'ef$1',
        '/[εΕ][ὐὑὙὔὕὝῦὖὗὒὓὛὺυύΥΎ]/u'                             => 'ev',
        '/[οΟ][ὐὑὙὔὕὝῦὖὗὒὓὛὺυύΥΎ]/u'                             => 'ou',
        '/(^|\s)[μΜ][πΠ]/u'                                      => '$1b',
        '/[μΜ][πΠ](\s|$)/u'                                      => 'b$1',
        '/[μΜ][πΠ]/u'                                            => 'b',
        '/[νΝ][τΤ]/u'                                            => 'nt',
        '/[τΤ][σΣ]/u'                                            => 'ts',
        '/[τΤ][ζΖ]/u'                                            => 'tz',
        '/[γΓ][γΓ]/u'                                            => 'ng',
        '/[γΓ][κΚ]/u'                                            => 'gk',
        '/[ηΗ][ὐὑὙὔὕὝῦὖὗὒὓὛὺυΥ]([θΘκΚξΞπΠσςΣτTφΡχΧψΨ]|\s|$)/u'   => 'if$1',
        '/[ηΗ][υΥ]/u'                                            => 'iu',
    );

is from WordPress plugin Autoconvert Greeklish permalinks

cptX commented 2 years ago

@dyrer unfortunately drupal cannot recognise two letters (diphthong) as input, so we are dumed to map only one character as input to one or two characters as output. Personally I add some custom code in a module to make the system able to recognise diphthongs... Unfortunately wordpress is better in this case with its plugin and the way they have implemented it give 100% flexibility in matching. I wish backdrop/drupal could have the same functionality...

IMPORTANT NOTICE: transliteration makes sense to make a url readable from greek people not from foreigners. As the node will have a greek title (thus the greek url) there is no point for foreigners to be able to read the words in the transilterated url. This is a very important notice because this defines which characters are selected for transliteration. For example the above wordpress module will transliterate the word "αυτός" το "aftos" and the word "αύριο" to "avrio". Personally as a greek person I would like something more close to the greek written word so I would like these words to be transliterated as "autos" and "aurio" respectively. They look much better in my opinion than what wordpress is doing. But in drupal as we can match only one character at a time now my suggestion would tranliterate them as "aytos" and "ayrio". This is not so bad and if you do google tests you will realize that actually google really matches the letter "y" to the greek letter "υ" so it' a good match... More bad in the eyes is the tranliteration of the dipthong "ου" which unfortunately drupal would make "oy" and not "ou". I think we can survive with this anyway as all drupal versions do the same

cptX commented 2 years ago

@klonos thanks for the time to explain the process. I will start with the documentation of the transliteration and when ready I will proceed to the merging. You are correct supposing I did the changes directly online, so isn't there any other easier way to merge it without local download? Actually I think I followed some instructions you had written for online modifications in github