SpongePowered / SpongeDocs

Documentation for Sponge and its Implementations
Creative Commons Attribution Share Alike 4.0 International
110 stars 116 forks source link

Numbers in Titles make Translation fail #657

Open Inscrutable opened 6 years ago

Inscrutable commented 6 years ago

We have encountered a problem with the way some of the translated Titles in pages are being handled by Sphinx. Any title that starts with a number and a full stop (eg 1. Example) is treated as a RST list, and the translation isn't applied when the docs are built. The effect is highly visible on many of the published SpongeDocs translations which are near-complete, as the plain English text stands out starkly.

The simplest fix we can apply is to change the titles to remove or obscure the number. This is as easy as enclosing it in brackets, i.e. (1) Example or by giving the title an overline (eg. -------- the same as the title underline. Either way, a number of pages will need to be amended to sort this problem out.

Pages affected (so far):

We could use help identifying all the pages that need amendment, so we can roll it all into one big PR if possible. Thanks to the ever-vigilant 3TUSK for first identifying the issue.

Suggestions are welcome on the preferred fix, or how alternative fixes might be achieved.

Grauldon commented 6 years ago

The code block below demonstrates that adding a close parenthesis prevents Sphinx from handling the text as a RST list. A number and a full stop/period followed by a close parenthesis is a common manner for enumerating lists.

1. This
2.) is
6.) more
4. of
20. a
40.) test
2. to
2.) check
3.) my
10.) theory

results in

  1. This 2.) is 6.) more
  2. of
  3. a 40.) test
  4. to 2.) check 3.) my 10.) theory

Question: Is there a way to test the translations before changing all of the documents?

Grauldon commented 6 years ago

Here is another option I just found: escape the full stop/period and we should be able to use the existing format without Sphinx handling them as lists. I suggest this approach.

1\. This
2\. is
6\. more
4\. of
20\. a
40\. test
2\. to
2\. check
3\. my
10\. theory

results in

1. This 2. is 6. more 4. of 20. a 40. test 2. to 2. check 3. my 10. theory

ST-DDT commented 6 years ago

Isn't this just a bug in the tool that extracts the translation sources?

https://github.com/SpongePowered/SpongeDocs/blame/stable/source/plugin/manager.rst

1. Dependency Injection
-----------------------

seems to work and is translateable.

https://github.com/SpongePowered/SpongeDocs/blame/stable/source/contributing/howtogit.rst

1. Forking a Repo
~~~~~~~~~~~~~~~~~

fails (at least for ZH-CN).

Both generates a <h3> Block, so theoretically it would be enough if we replace all the ~ with -. This should be very simple using a regex. Or is there something that I'm missing?

ST-DDT commented 6 years ago

If my regex [0-9]+\. .*\n~{3,} isn't wrong then source/contributing/howtogit.rst is the only page with this kind of issue. And it also uses the ~ wrongly, since the header order is supposed to be = > - > ~. So I guess it would be enough to just fix that page and everything is fine.

Grauldon commented 6 years ago

I used '^[1-9][\.)]\ ' and found the following pages with the problem:

However, I did noticed several other Titles that aren't numbered, but aren't translating either. They all had the ~ underlining.

ST-DDT commented 6 years ago
  • source/plugin/manager.rst
  • source/plugin/optional/basic.rst

These look good to me.

image image

With these additional samples I would narrow the issue down to instances where there is a numbered list before the section header.

Grauldon commented 6 years ago

I have spent many hours trying to determine the flow of the SpongeDocs build process to see if I could debug build errors. In the process, I have found two missing links on the Docs homepage:

Using the links in the bottom left-hand corner of the pages, I checked all of the build logs. I have a table at https://pastebin.com/raw/B5NvP61e. The table shows which languages appear to be working and which ones have errors in their build logs.

The general rule seems to be that languages without build errors are working, while those with build errors are not. Exceptions are Indonesian, Norwegian, and Netherlands. I used Plugin Manager to check, and I looked to see if "1. Dependency Injection" was in English or not.

Isn't this just a bug in the tool that extracts the translation sources?

Perhaps ST-DDT's comment about the tool needs to be explored before changes are made to the docs themselves.

I can't find the tool, don't understand the process, or am missing something. So, I can't check the tool. However, I don't mind making the changes once they are identified and if they are needed in the docs themselves.

Inscrutable commented 6 years ago

It's possible @Minecrell has some idea where this step it is happening. It's still probably simpler for us to reformat the errant headers than to try tinkering with the Crowdin processing. Also not that languages that fall below 5% translated may drop off the list of built translations, iirc.

stephan-gh commented 6 years ago

Currently, the homepage is not re-built automatically, so new languages only appear when a new build is triggered manually. I just did that so they show up now.

I think I debugged this issue back then and it looked like a bug in Sphinx's translation code. Not the extraction or Crowdin, but rather a bug when the translations are applied during the build. It incorrectly detects the text as enumeration and doesn't apply it properly.

@Grauldon Thanks for testing this :) I think there is a pattern in your table but it's not related to build errors: The difference for Indonesian and Chinese is that they changed/dropped the number in the translated docs. That way it's not detected as enumeration and translated properly.

Conclusion: The source text (English) is not a problem, it only causes problems within the translated text.

Some example links: