Open kaktusus opened 1 year ago
Solutions that come to mind:
Compromise for developers and translators :wink:
additional parsing of the file with strings after extraction so as to remove everything that gets in the way (a little bit of sed magic)
something like: sed -e 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified.pot
Rewriting the source files and making them look consistent The well-formatted code we see in the files: for example downloads.php donation.php
Waiting for suggestions from staff ... and other users :wink:
I would maybe write a script that combines your sed command + the xgettext command, that we use instead of the xgettext line... Maybe it could even be integrated to the updateCrowdin script?
But it needs to be tested first if that "sanitized" .po file can still be recognized and used by php.
I see some special cases that may be able to be solved. I am further conducting analysis in this regard.
"
character.What happens next with the generated homepage.pot file I understand that this is not the end of processing ...
For testing I use: xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified3.pot
After that, it can be shortened to: xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -ie 's/\\n"/"/g;s/"[[:blank:]]*/"/g' homepage.pot
Changes will be applied directly to the source file.
Hmm so every line needs to be ended by a space? That's annoying (many code editors will remove that trailing space automatically) but it's doable. Maybe that's the best solution here...
part of the code (the ugly one) is just prepared this way :stuck_out_tongue_winking_eye:
We can customize the processing with sed any way we want so that we are all satisfied,
you just need to choose the right rules.
at the moment we have:
searching for \n"
and replacing with "
.
and
searching for "
with any number of spaces or tabs and replacing with "
.
if we change or add search keys you can customize everything
however, we must remember that each change of the source string generates a lot of work for translators, so everything must be well thought out (and tested) and changes to the production environment must be introduced once
Hmm so every line needs to be ended by a space?
Whether we choose spaces as the last character of the line or as the first character in the line the entire code will still require adjustment to the chosen rule.
I modified the selection key and thus solved the first problem I showed in the picture above
xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"[[:blank:]]*/"/g' homepage.pot > homepage_modified4.pot
The second special case requires more of my attention, I need to read the documentation to solve it.
Yorik I consider this solution as a temporary prosthesis if you would like to apply it in a production environment I have no objection. However, for me the best solution is the one from point 2.
@chennes and @luzpaz and others what do you think about this topic?
Everything works with my expectations Too bad only Yorik picked up the gauntlet .... as a rule, the more different opinions the better the result.
I have tested two different variants and both are perfectly suitable for my planned task:
15:59 ~/Pobrane/FreeCAD/test$ xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"\s\{2,\}/"/g' homepage.pot > homepage_modified5.pot
17:02 ~/Pobrane/FreeCAD/test$ xgettext --from-code=UTF-8 -o homepage.pot *.php && sed -e 's/[[:space:]]*\\n"/ "/g;s/"[[:blank:]]\{2,\}/"/g' homepage.pot > homepage_modified5.pot
:smiley:
Any ideas?
This needs to be tested first. Because if the .pot file contains a string that is different than in the HTML file, the gettext system might not be able to match and apply the translation
to make testing easier and faster, I suggest you look at the files with Polish translations. They may contain many answers.
The translations do not contain unnecessary characters /n
and strings of blank spaces.
Hey @Reqrefusion, thanks for all your work on the FC homepage. Could you weigh-in on this translation ticket ? TIA
Hey @Reqrefusion, thanks for all your work on the FC homepage. Could you weigh-in on this translation ticket ? TIA
A complete herculean task. It's not so much that it will cause problems as it is that the current translation will be broken. It will take a lot of messing around to find the right way. I have a few things in mind right now, but I don't know which ones won't cause problems.
any change in the source string will generate work for the translators. The changed string will be recognized by Crowdin as a new translation unit.
There is no getting around it.
any change in the source string will generate work for the translators. The changed string will be recognized by Crowdin as a new translation unit.
There is no getting around it.
I have been searching the internet since the morning and unfortunately you are right. Actually there are some complicated ways like manually editing the project files but they are very complicated. However since there will be an exact match I think translators just need to approve it. I have even seen that the project manager may not even ask for translator approval for exact matches. It would be better to make discussions about this through him when I will create a PR about it.
Crowdin supports translators in such cases (minor string correction) and proposes translations that almost match the original string based on translation history.
Such action makes the work of translators much easier and faster. :wink:
The issue relates to
\n
characters appearing in some strings exported to Crowdin.I will introduce the problem using the example of string:
Thank you for supporting FreeCAD! Whether you donated a little or a lot, all your efforts contribute to further and faster development of FreeCAD
.A string was imported into Crowdin: https://crowdin.com/translate/freecad/27908/en-pl#6625455
After extracting from the contributor.php source file, we get:
The way it is written is very intriguing and the
\n
sign is presented in a different way.homepage.pot.txt
The view in the source file:
https://github.com/FreeCAD/FreeCAD-Homepage/blob/54e47134da26d82d253184ebb4661c3911d4b583/contributor.php#L15
It is worth mentioning that there are many multi-line long strings. However, not all of them have extra line breaks inside. Different source files use a different way of writing (eg donation.php). So the issue does not always occur.
kaktus' note https://manpages.debian.org/unstable/gettext/xgettext.1.en.html