AntennaHouse / pdf5-ml

Antenna House PDF5-ML DITA-OT Plug-in
23 stars 9 forks source link

Move string literals to separate file to improve translation #235

Closed AndeeZee closed 1 year ago

AndeeZee commented 1 year ago

Hello!

Would it be possible for you to move all string literals to a separate default config file, because this will improve the translation process a lot. As for now, we need to go through the huge default style file, to find all the string literals for review and translation.

Thank you very much, Andy

ToshihikoMakita commented 1 year ago

Could you give me more specific running situations? For instance:

ToshihikoMakita commented 1 year ago

Also, are you using CCMS for content management?

AndeeZee commented 1 year ago

Hello!

Yes, we have our own plug-in, which overrides/extends your base plugin. And since we are based in Europe, so all European languages are potential candidates for translation and in this specific case, we are using a CCMS, but the use-case is independent from the CCMS.

As answered above, we have an extension plugin, which will be used for European languages. Translating the strings is not a problem.

The "problem" for me as a developer is, to collect all literal strings from the default style (base plugin). Put them together and give those away for review and translation. The review and translation are not the problem.

To make it easier for these situations, it would be great to have a "separation of concerns" of the config-files in your plugin: So move all literal strings into a separate file and maybe also moving all variables defining images (for notes, etc.) also to a separate file. This would also align it more to the dita standard, where we also have separate string-files for each language.

Thank you very much, Andy

ToshihikoMakita commented 1 year ago

I'm making sample plug-in override. Please give me a time little.

ToshihikoMakita commented 1 year ago

Making sample plug-in override has been done. If you have a fork from https://github.com/AntennaHouse/pdf5-ml, please refresh it to get this example in develop branch.

I've made several customization example in jp.acme-corporation.pdf.ml.

Language specific resources

jp.acme-corporation.pdf.ml/config/en-US_style.xml

<style-definition 
    xmlns:axf="http://www.antennahouse.com/names/XSL/Extensions"
    xmlns="http://www.antennahouse.com/names/XSLT/Document/Layout">

    <variable name="Warninng_Icon">url(%plug-in-path%common-graphic/warning-en-us.svg)</variable>

    <variable name="Note_Warning">WARNING</variable>

</style-definition>

jp.acme-corporation.pdf.ml/config/de-DE_style.xml

<style-definition 
    xmlns:axf="http://www.antennahouse.com/names/XSL/Extensions"
    xmlns="http://www.antennahouse.com/names/XSLT/Document/Layout">

    <variable name="Warninng_Icon">url(%plug-in-path%common-graphic/warning-de-de.svg)</variable>

    <variable name="Note_Warning">WARNUNG</variable>

</style-definition>

jp.acme-corporation.pdf.ml/config/fr-FR_style.xml

<style-definition 
    xmlns:axf="http://www.antennahouse.com/names/XSL/Extensions"
    xmlns="http://www.antennahouse.com/names/XSLT/Document/Layout">

    <variable name="Warninng_Icon">url(%plug-in-path%common-graphic/warning-fr-fr.svg)</variable>

    <variable name="Note_Warning">AVERTISSEMENT</variable>

</style-definition>

XSLT stylesheet customizations

See https://github.com/AntennaHouse/pdf5-ml/blob/develop/jp.acme-corporation.pdf.ml/xsl/dita2fo_note.xsl .

The DITA autoring

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="resouceChangeTest" xml:lang="en-US">
    <title>Resouce Change Test</title>
    <shortdesc>Testing automatic change for note image URL and literal value.</shortdesc>
    <conbody>
        <section xml:lang="en-US">
            <title>xml:lang="en-US"</title>
            <note type="warning" outputclass="test">
                <p>Here is "<ph outputclass="warning-title"/> notation.</p> 
            </note>
        </section>
        <section xml:lang="de-DE">
            <title>xml:lang="de-DE"</title>
            <note type="warning" outputclass="test">
                <p>Hier ist die Notation „<ph outputclass="warning-title"/>“.</p> 
            </note>
        </section>
        <section xml:lang="fr-FR">
            <title>xml:lang="fr-FR"</title>
            <note type="warning" outputclass="test">
                <p>Voici la notation "<ph outputclass="warning-title"/>".</p> 
            </note>
        </section>
    </conbody>
</concept>

The sample output

2022-11-22-7

I believe that this example will help your customization needs.

I attached the authoring and build result (via DITA-OT 3.7.4).

20221122-extend-language.zip

AndeeZee commented 1 year ago

Hello!

Thanks for the sample plugin, however, this is not my main concern. Please find attached some modifications to the default config of the base plugin: config.zip

It would be great if you could align the base plugin to this. As you can see, I extracted all strings and all images to separate files. At least I hope that I was able to catch all. With these separate files, it is now very easy for me to not only have all strings and all images at one single place, but it is even more simpler to give the default_strings.xml to a translation service to ask for review and translation to .... And it is now very easy for me to check the images, if there might be a localization necessary, because I need to check only one file and not going through the huge default again.

For now, I need to extract all the strings manually from the huge default. Same applies for the images.

So it would be great if you could rework the base plugins default_style.xml and move all strings and all images to separate files.

I hope this sample explains better, where my problems are.

Thank you, Andy

ToshihikoMakita commented 1 year ago

I'm still not aware why you refer to "base plugin" style structure. As long as you override PDF5-ML by yourself, it is free to configure your styles within your plugin side.

For instance, if your default_style.xml needs string variables to be separated in another file, you can easily do it by coding below.

<style-definition xmlns:axf="http://www.antennahouse.com/names/XSL/Extensions"
    xmlns:fo="http://www.w3.org/1999/XSL/Format"
    xmlns:ahp="http://www.antennahouse.com/names/XSLT/Document/PageControl"
    xmlns="http://www.antennahouse.com/names/XSLT/Document/Layout">

    <include href="default_equation_style.xml"/>
    <include href="default_xml_mention_style.xml"/>
    <include href="default_markup_style.xml"/>

    <variables>
        ...
    </variables>

    <attribute-sets>
        ...
    </attribute-sets>

    <!-- include your variables at the end of default_style.xml -->
    <include href="default_strings.xml"/>
    <include href="default_images.xml"/>

    <!-- DO NOT REMOVE! Style customization inclusion -->
    <!-- Probably not needed -->
    <!--include href="../customization/default_style_custom.xml"/-->
</style-definition>

This is also effective in your language specific style definition.

[de-DE_style.xml]

<style-definition 
    xmlns:axf="http://www.antennahouse.com/names/XSL/Extensions"
    xmlns="http://www.antennahouse.com/names/XSLT/Document/Layout">

    <include href="de-DE_strings.xml"/>
    <include href="de-DE_images.xml"/>

</style-definition>

There are no needs to refer to base plugin style structure at all. Colud you tell me the reason why you must change PDF5-ML plugin style structure?

AndeeZee commented 1 year ago

Hello! as said in my first post, we need to translate the existing strings of the base plugin into any number of target languages.

The procedure for now is tiresome and not straightforward, because: For this, I need to analyze/go through the huge default-style to find out, which variables contain a string for localization. Then. I need to copy paste those to a separate file to give those for translation. A translation service will never work on the huge default.style.

That's why I need to have a separate file in the base plugin. This I can easily give directly to a service, because it contains only strings for localization and nothing else.

Furthermore: If you might add new strings in the future to the base plugin, adding them to a separate string file is easier to find out, than comparing the huge default one.

It's not about how to use translated strings, this is about collecting all existings strings (and images) and preparing them for translation.

Kind regards,

Andy

ToshihikoMakita commented 1 year ago

Unfortunately, I have no intention to change current plugin style structure.

We, Antenna House has many use-cases that overrides PDF5-ML plugin. Most users use CMS such as RWS Tridion Docs, IXIASOFT DITA CMS. In their implementation, all of the literal strings (other than URLs that is used as icons) are stored in CMS side. It has been passed to the plug-in by the following topicref in a main map.

<topicref href="GUID-E62BFDBE-DB84-4FD7-949A-CBED34B4A0D8.xml" format="xml" outputclass="language-resource"/>

GUID-E62BFDBE-DB84-4FD7-949A-CBED34B4A0D8.xml

<variables xmlns="http://www.antennahouse.com/names/XSLT/Document/Layout" xml:lang="ja">
  <variable name="Note_Note">Note: </variable>
  <variable name="Note_Important">Important: </variable>
  <variable name="Note_Caution">Caution: </variable>
  <variable name="Note_Danger">Danger: </variable>
  <variable name="Note_Warning">Warning: </variable>
  <!-- for title-page -->
  <variable name="Xref_Title_Prefix">"</variable>
  <variable name="Xref_Title_Suffix">"</variable>
  <variable name="Xref_Title_Page_Delimiter">
  </variable>
  <variable name="Xref_Page_Prefix">(page</variable>
  <variable name="Xref_Page_Suffix">)</variable>
  <!-- for title -->
  <variable name="Xref_Title_Only_Prefix">"</variable>
  <variable name="Xref_Title_Only_Suffix">"</variable>
  <!-- for page -->
  <variable name="Xref_Page_Only_Prefix">Page</variable>
  <variable name="Xref_Page_Only_Suffix"></variable>
</variables>

There are no needs from user to pick up all of the variables defined in default_style.xml. The user translates GUID-E62BFDBE-DB84-4FD7-949A-CBED34B4A0D8.xml in their CMS workflow. It's all. The plugin customization pickups topicref[@outputclass eq 'language-resource'] and merge pointed XML file into style generation flow (dita2fo_style_set.xsl) without understanding which language content is there.

A user plugin implementation needs limited string resource for their purpose. They do not translate all of the strings defined in default_style.xml because it is waste of time and money. This is the actual world use case.

variable in default_style.xml are used in many purposes. One should be translated but the other should not to be translated anyway. If you do want default_strings.xml or default_images.xml for your user, they should be controlled by your side.