[Homepage] discussion - unnecessary end-of-line characters in strings to translate

kaktusus commented 1 year ago

The issue relates to \n characters appearing in some strings exported to Crowdin.

I will introduce the problem using the example of string: Thank you for supporting FreeCAD! Whether you donated a little or a lot, all your efforts contribute to further and faster development of FreeCAD.

A string was imported into Crowdin: https://crowdin.com/translate/freecad/27908/en-pl#6625455 obraz

After extracting from the contributor.php source file, we get:

obraz The way it is written is very intriguing and the \n sign is presented in a different way.

homepage.pot.txt

The view in the source file:

obraz https://github.com/FreeCAD/FreeCAD-Homepage/blob/54e47134da26d82d253184ebb4661c3911d4b583/contributor.php#L15

obraz

It is worth mentioning that there are many multi-line long strings. However, not all of them have extra line breaks inside. Different source files use a different way of writing (eg donation.php). So the issue does not always occur.

kaktus' note https://manpages.debian.org/unstable/gettext/xgettext.1.en.html

kaktusus commented 1 year ago

Solutions that come to mind:

Compromise for developers and translators :wink: additional parsing of the file with strings after extraction so as to remove everything that gets in the way (a little bit of sed magic) something like: sed -e 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified.pot
Rewriting the source files and making them look consistent The well-formatted code we see in the files: for example downloads.php donation.php
Waiting for suggestions from staff ... and other users :wink:

yorikvanhavre commented 1 year ago

I would maybe write a script that combines your sed command + the xgettext command, that we use instead of the xgettext line... Maybe it could even be integrated to the updateCrowdin script?

But it needs to be tested first if that "sanitized" .po file can still be recognized and used by php.

kaktusus commented 1 year ago

I see some special cases that may be able to be solved. I am further conducting analysis in this regard.

obraz

each line of a multi-line statement should end with a space otherwise you get a cluster of words
I need to build a rule that allows a single blank space to be left after the " character.

What happens next with the generated homepage.pot file I understand that this is not the end of processing ...

For testing I use: xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified3.pot After that, it can be shortened to: xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -ie 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot Changes will be applied directly to the source file.

yorikvanhavre commented 1 year ago

Hmm so every line needs to be ended by a space? That's annoying (many code editors will remove that trailing space automatically) but it's doable. Maybe that's the best solution here...

kaktusus commented 1 year ago

part of the code (the ugly one) is just prepared this way :stuck_out_tongue_winking_eye:

We can customize the processing with sed any way we want so that we are all satisfied, you just need to choose the right rules. at the moment we have: searching for \n" and replacing with ". and searching for " with any number of spaces or tabs and replacing with ". if we change or add search keys you can customize everything

however, we must remember that each change of the source string generates a lot of work for translators, so everything must be well thought out (and tested) and changes to the production environment must be introduced once

Hmm so every line needs to be ended by a space?

Whether we choose spaces as the last character of the line or as the first character in the line the entire code will still require adjustment to the chosen rule.

kaktusus commented 1 year ago

I modified the selection key and thus solved the first problem I showed in the picture above

xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified4.pot

obraz

The second special case requires more of my attention, I need to read the documentation to solve it.

Yorik I consider this solution as a temporary prosthesis if you would like to apply it in a production environment I have no objection. However, for me the best solution is the one from point 2.

@chennes and @luzpaz and others what do you think about this topic?

kaktusus commented 1 year ago

Everything works with my expectations Too bad only Yorik picked up the gauntlet .... as a rule, the more different opinions the better the result.

obraz

I have tested two different variants and both are perfectly suitable for my planned task:

15:59 ~/Pobrane/FreeCAD/test$ xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"\s\{2,\}/"/g' homepage.pot > homepage_modified5.pot
17:02 ~/Pobrane/FreeCAD/test$ xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"[[:blank:]]\{2,\}/"/g' homepage.pot > homepage_modified5.pot

:smiley:

kaktusus commented 1 year ago

Any ideas?

yorikvanhavre commented 1 year ago

This needs to be tested first. Because if the .pot file contains a string that is different than in the HTML file, the gettext system might not be able to match and apply the translation

kaktusus commented 1 year ago

to make testing easier and faster, I suggest you look at the files with Polish translations. They may contain many answers. The translations do not contain unnecessary characters /n and strings of blank spaces.

luzpaz commented 2 months ago

Hey @Reqrefusion, thanks for all your work on the FC homepage. Could you weigh-in on this translation ticket ? TIA

Reqrefusion commented 2 months ago

Hey @Reqrefusion, thanks for all your work on the FC homepage. Could you weigh-in on this translation ticket ? TIA

A complete herculean task. It's not so much that it will cause problems as it is that the current translation will be broken. It will take a lot of messing around to find the right way. I have a few things in mind right now, but I don't know which ones won't cause problems.

kaktusus commented 2 months ago

any change in the source string will generate work for the translators. The changed string will be recognized by Crowdin as a new translation unit.

There is no getting around it.

Reqrefusion commented 2 months ago

any change in the source string will generate work for the translators. The changed string will be recognized by Crowdin as a new translation unit.

There is no getting around it.

I have been searching the internet since the morning and unfortunately you are right. Actually there are some complicated ways like manually editing the project files but they are very complicated. However since there will be an exact match I think translators just need to approve it. I have even seen that the project manager may not even ask for translator approval for exact matches. It would be better to make discussions about this through him when I will create a PR about it.

kaktusus commented 2 months ago

Crowdin supports translators in such cases (minor string correction) and proposes translations that almost match the original string based on translation history.

Such action makes the work of translators much easier and faster. :wink:

FreeCAD / FreeCAD-translations

[Homepage] discussion - unnecessary end-of-line characters in strings to translate #275