DeepLcom / deepl-dotnet

Official .NET library for the DeepL language translation API.
MIT License
166 stars 24 forks source link

Translate string literals with programming logic #41

Open Juanca24691 opened 6 months ago

Juanca24691 commented 6 months ago

Currently I am developing a tool that extracts all string literals that are enclosed in double quotes from a file, the problem is that these string literals have programming logic as escape sequences and Deepl is not able to maintain that logic, this is an example of a string literal containing Russian characters that I want to translate into Spanish

"{ffffff}Дом {ffff00}№%d\n\ {ffffff}Стартовая цена аукциона: {ffff00}%d BTC\n\ {ffffff}Минимальная ставка: {ffff00}%d BTC\n\ Завершение аукциона через: {ffff00}%s\n\n\ {cccccc}%s - %d BTC\n\n\ {ffffff}Укажите сумму вашей ставки:"

How should I approach this situation?

daniel-jones-deepl commented 6 months ago

Hi @Juanca24691, thanks for the question.

For tags like these you are best to use XML tag handling -- the DeepL API handles these tags carefully not to translate their content. We have an example translating Mustache templates in our Python library using this technique. That example might be useful to you.

Summarised: you could parse the strings and replace the {} tags with XML tags, putting their content in an attribute. Additionally you may need to replace the %... and \n sequences too. For example:

{ffffff}Дом {ffff00}№%d\n
{ffffff}Стартовая цена аукциона: {ffff00}%d BTC\n

would become:

<x c="{ffffff}" />Дом <x c="{ffff00}" />№<x c="%d\n" />
<x c="{ffffff}" />Стартовая цена аукциона: <x c="{ffff00}%d" /> BTC<x c="\n" />

Translate that string to Spanish with tag handling="xml", to get something like:

<x c="{ffffff}" />Casa <x c="{ffff00}" />#<x c="%d\n" />
<x c="{ffffff}" />Precio de salida de la subasta: <x c="{ffff00}%d" /> BTC<x c="\n" />

Then you can replace the tags with their original content:

{ffffff}Casa {ffff00}#%d\n
{ffffff}Precio de salida de la subasta: {ffff00}%d BTC\n

I hope that helps, let me know if you have more questions.

Juanca24691 commented 6 months ago

I have implemented it but Deepl adds extra characters that it shouldn't, for example XML tag closures sometimes add an extra closure like this /> or in some cases when there is a symbol like this | deepl adds one more ||