KhronosGroup / OpenGL-Refpages

OpenGL and OpenGL ES reference page sources, and generated HTML used as backing store for khronos.org
423 stars 130 forks source link

Extension xhtml ignored by Google translate #153

Open KhronosWebservices opened 1 month ago

KhronosWebservices commented 1 month ago

A Russian speaker has discovered that the .xhtml cannot be translated by Google Translate. For example, this .html page translated fine:

https://registry-khronos-org.translate.goog/OpenGL-Refpages/gl4/?_x_tr_sl=en&_x_tr_tl=ru

This page does not: https://registry-khronos-org.translate.goog/OpenGL-Refpages/gl4/html/all.xhtml?_x_tr_sl=en&_x_tr_tl=ru

Instead, getting redirected to: https://registry-khronos-org.translate.goog/OpenGL-Refpages/gl4/html/all.xhtml

Changing the extension to just .html and the page then translates without any issues. I could imagine there are others that are having translation issues on our ref pages.

Is there anything that can be done to fix this issue?

oddhack commented 1 month ago

The toolchain is Docbook 4 -> XHTML Transitional. Switching to Docbook 5 -> HTML would be a huge amount of work. I do not know what happens these days if we simply rename .xhtml -> .html as I have barely touched this in a decade. If you're confident that it would be benign on all the major browsers and platforms we could try that, although would also have to establish redirects.

KhronosWebservices commented 1 month ago

I'll setup a test folder with .html inside and see how that goes.

KhronosWebservices commented 1 month ago

This looks to work as expected: https://registry.khronos.org/OpenGL-Refpages/gl4/html_test/

Only significant change was to modify the , otherwise seems to work well.

BuslikDrev commented 1 month ago

https://registry.khronos.org/OpenGL-Refpages/gl4/html_test/ Я проверил все страницы, теперь перевод работает через https://translate.google.com/?op=websites. Спасибо.

KhronosWebservices commented 1 month ago

Отлично, спасибо, что помогли нам решить эту проблему.

KhronosWebservices commented 1 month ago

@oddhack Changing to .html fixed the issue. If we have any other areas that are only .xhtml, it might be worth setting up similar.

In the meantime, what is the best way to make the new .html extension files the permanent go-to for the OpenGL 4 RefPages?

oddhack commented 1 month ago

All the various versions of GL refpages and the EGL refpages use the same toolchain. I will need to understand exactly what you did aside from file renaming and the script tag and replicate that in the Makefiles, then setup redirects for .xhtml -> .html and change the index generation scripts.

KhronosWebservices commented 1 month ago

No other changes were required apart from modifying the script tag, renaming .xhtml to .html and updating all the links in all the files to point to .html instead of .xhtml.

AFAIK once the Makefiles are updated from xhtml to html, and

?

oddhack commented 1 month ago

AFAIK once the Makefiles are updated from xhtml to html, and ?

"Automatically" in the sense of needing to run a script over the output .xhtml document. There is no way to generate HTML5 output from Docbook 4 source, which predates HTML5 and is obsolete, but maybe this will patch around it. Did you happen to run an HTML5 validator over the test directory?

This might be good motivation to finally convert the refpage source to asciidoc markup.

KhronosWebservices commented 1 month ago

Did you happen to run an HTML5 validator over the test directory?

No, a little nervous of the output... but I will today.

oddhack commented 1 month ago

I have the impression that non-valid HTML is likely to downrank search results on that page. Right now generally the right thing happens when you put in a GL entry point, the top result is likely to be the XHTML refpage which in turn is probably valid XHTML because it comes from the Docbook toolchain. If we convert it to HTML5 by postprocessing but it's not valid HTML5, that might change.

KhronosWebservices commented 1 month ago

There are some issues with the current xhtml pages and some general issues from converting:

  1. \<table style="cellpadding: 0; cellspacing: 0;"> should be \<table style="border-collapse: collapse;"> This is a CSS error as cellpadding and cellspacing doesn't exist in CSS.
  2. Any tags that are self closing either need to be closed or removed. \<footer/> and \<header/> are the culprits here. I've removed those on the test page. There was also a \<span class="trademark"/> tag which I think was supposed to be \<span class="trademark">©\</span>?
  3. Any void elements like \<col/> \<link/> \<script/> \<meta/> should be \<col> \<link> \<meta>. And \<script/> should become \<script>\</script>.
  4. Remove type="text/javascript", no longer needed in script tags.
  5. Update \<html> to be \<html lang="en">.
  6. Added language meta tag \<meta charset="utf-8">.
  7. Remove XML from line 1.

Test page with changes applied, and will validate Green: https://registry.khronos.org/OpenGL-Refpages/gl4/html_test/glBindFragDataLocationIndexed.html

Randomly picking a different page and applying the same changes resulted also in properly validation.

oddhack commented 1 month ago

TBH I would rather work on an asciidoc conversion. Postprocessing XHTML to HTML 5 looks really fragile / heuristic and I think the time will be better spent on a more modern toolchain though it will not be an instant fix. I'm sorry for the person reporting the translation problem as this will not immediately solve their problem.

BTW if you write comments with HTML tags in github (or gitlab), the tags are prone to disappearing in the web view - a lot of your comment above suffers this. Compare

(self-closing 'col' tag, which does not render)

with

\ (same tag with a backslash before the left angle bracket).

Comments are treated as Github-Flavored Markdown and the brackets are specially treated. Extremely annoying.