Closed hiciefte closed 2 weeks ago
Could you share your toolset for the translation pipeline?
Could you share your toolset for the translation pipeline?
Sure! I've to clean-up the code a bit before I can publish it on my GitHub Repo or if it makes sense to integrate it to the Bisq organization. Here is a summary on the process.
My properties files translator is a Python-based automation tool designed to streamline the localization process of .properties files. It leverages OpenAI's language models to translate key-value pairs efficiently while maintaining consistency and handling special formatting.
Key Features:
Asynchronous Translation Utilizes asyncio for concurrent API calls, significantly speeding up the translation process for large files.
Progress Monitoring Integrates tqdm.asyncio to provide real-time progress bars, offering clear visibility into ongoing translation tasks.
Glossary Support Implements a customizable glossary (glossary.json) to ensure specific terms are translated consistently across all files.
Placeholder Preservation
Detects and safeguards placeholders (e.g., {0},
Robust Error Handling Incorporates comprehensive exception management to handle API timeouts, connection issues, rate limits, and unexpected errors gracefully, including retry mechanisms.
Contextual Translations Builds context from existing translations to guide the translation model, enhancing accuracy and relevance, especially for well-translated locales.
Accurate Mapping Ensures each translated text is correctly associated with its corresponding key, preventing mismatches through indexed task management.
File Management Reads source and target .properties files, identifies untranslated or outdated entries, translates them, writes the updated translations to output files, and archives processed files for organization.
Workflow
Configuration Loading: Reads settings from a config.yaml file, including input/output directories, glossary path, and model specifications.
Glossary and Source Loading: Loads the glossary to enforce term consistency and reads the source .properties files to identify text needing translation.
Parsing and Extraction: Parses target .properties files, extracts untranslated or unchanged entries by comparing with source translations, and prepares them for translation.
Translation Process: Concurrently translates the extracted texts using OpenAI's API, applying the glossary and preserving placeholders. Displays progress through a dynamic progress bar for user feedback.
Integration and Output: Integrates the translated texts back into the parsed file structure, ensuring correct key-value mappings. Writes the updated translations to designated output directories.
Archiving: Moves processed files to an archive folder to maintain a clean workspace and track translation history.
I do have a openAPI account for that purpose with API credits. Costs for a whole locale translation are around 1$. Main efforts for new locales are preparation of the glossary, configuration in the Bisq2 app, and reviewing of each translated file if something went wrong.
I did try to achieve this in the beginning with a custom GPT, but that failed on multiple points. It was unreliable in quality and predictability. It f*** up something different every time I tried to translate a batch of property files. In the end handling the parts that have to be absolutely correct in Python and only leverage chatGPT for the translation task guiding the translation with glossary, context and existing translations was the way to go.
The Python script also needed a couple of iterations until it handled every case and also the instruction for chatGPT needed some iteration up to this point. But now it seems to work pretty good. I'll test and re-fine it for the existing locales for the next two releases, but then it should be stable enough so someone can pick it up an do all translation and synchronization tasks from that point on.
As a first test for new locales with my local translation pipeline I added Afrikaans as it was the locale on Transifex @MwithM already started to translate (~5%). I'm holding off for now adding new locales until we have a specific requirement, but I just wanted to give it a shot. No need for compensation if you guys think this locale doesn't make sense right now.