FreeCAD / FreeCAD-translations

Repository tracking localization issues and progress
5 stars 3 forks source link

[Homepage] discussion - unnecessary end-of-line characters in strings to translate #275

Open kaktusus opened 9 months ago

kaktusus commented 9 months ago

The issue relates to \n characters appearing in some strings exported to Crowdin.

I will introduce the problem using the example of string: Thank you for supporting FreeCAD! Whether you donated a little or a lot, all your efforts contribute to further and faster development of FreeCAD.

A string was imported into Crowdin: https://crowdin.com/translate/freecad/27908/en-pl#6625455 obraz

After extracting from the contributor.php source file, we get:

obraz The way it is written is very intriguing and the \n sign is presented in a different way.

homepage.pot.txt

The view in the source file:

obraz https://github.com/FreeCAD/FreeCAD-Homepage/blob/54e47134da26d82d253184ebb4661c3911d4b583/contributor.php#L15


obraz


It is worth mentioning that there are many multi-line long strings. However, not all of them have extra line breaks inside. Different source files use a different way of writing (eg donation.php). So the issue does not always occur.


kaktus' note https://manpages.debian.org/unstable/gettext/xgettext.1.en.html

kaktusus commented 9 months ago

Solutions that come to mind:

  1. Compromise for developers and translators :wink: additional parsing of the file with strings after extraction so as to remove everything that gets in the way (a little bit of sed magic) something like: sed -e 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified.pot

  2. Rewriting the source files and making them look consistent The well-formatted code we see in the files: for example downloads.php donation.php

  3. Waiting for suggestions from staff ... and other users :wink:

yorikvanhavre commented 9 months ago

I would maybe write a script that combines your sed command + the xgettext command, that we use instead of the xgettext line... Maybe it could even be integrated to the updateCrowdin script?

But it needs to be tested first if that "sanitized" .po file can still be recognized and used by php.

kaktusus commented 9 months ago

I see some special cases that may be able to be solved. I am further conducting analysis in this regard.

obraz

What happens next with the generated homepage.pot file I understand that this is not the end of processing ...

For testing I use: xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified3.pot After that, it can be shortened to: xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -ie 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot Changes will be applied directly to the source file.

yorikvanhavre commented 9 months ago

Hmm so every line needs to be ended by a space? That's annoying (many code editors will remove that trailing space automatically) but it's doable. Maybe that's the best solution here...

kaktusus commented 9 months ago

part of the code (the ugly one) is just prepared this way :stuck_out_tongue_winking_eye:


We can customize the processing with sed any way we want so that we are all satisfied, you just need to choose the right rules. at the moment we have: searching for \n" and replacing with ". and searching for " with any number of spaces or tabs and replacing with ". if we change or add search keys you can customize everything


however, we must remember that each change of the source string generates a lot of work for translators, so everything must be well thought out (and tested) and changes to the production environment must be introduced once


Hmm so every line needs to be ended by a space?

Whether we choose spaces as the last character of the line or as the first character in the line the entire code will still require adjustment to the chosen rule.

kaktusus commented 9 months ago

I modified the selection key and thus solved the first problem I showed in the picture above

xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified4.pot

obraz

The second special case requires more of my attention, I need to read the documentation to solve it.

Yorik I consider this solution as a temporary prosthesis if you would like to apply it in a production environment I have no objection. However, for me the best solution is the one from point 2.

@chennes and @luzpaz and others what do you think about this topic?

kaktusus commented 9 months ago

Everything works with my expectations Too bad only Yorik picked up the gauntlet .... as a rule, the more different opinions the better the result.

obraz

I have tested two different variants and both are perfectly suitable for my planned task:

15:59 ~/Pobrane/FreeCAD/test$ xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"\s\{2,\}/"/g' homepage.pot > homepage_modified5.pot
17:02 ~/Pobrane/FreeCAD/test$ xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"[[:blank:]]\{2,\}/"/g' homepage.pot > homepage_modified5.pot

:smiley:

kaktusus commented 9 months ago

Any ideas?

yorikvanhavre commented 9 months ago

This needs to be tested first. Because if the .pot file contains a string that is different than in the HTML file, the gettext system might not be able to match and apply the translation

kaktusus commented 9 months ago

to make testing easier and faster, I suggest you look at the files with Polish translations. They may contain many answers. The translations do not contain unnecessary characters /n and strings of blank spaces.