bbepis / XUnity.AutoTranslator

MIT License
1.9k stars 289 forks source link

Splitter Regex #255

Open ndrwln opened 2 years ago

ndrwln commented 2 years ago

If I understand correctly, the splitter regex should split some text and provide translations of each part separately. If one part is not found, then it should create 2 different translation entries. That is what I believe should be the intended behavior.

I am trying to implement line by line translations for easier readability + the ability to handle repeated text in some cases. I created this splitter regexes to tokenize sentences by splitting from the period mark. sr:"(.+)。(.+)"=$1. $2

Unfortunately, it doesn't work even though I tested it outside. In the file, this is outputted:

你出生在宁州的一个练武世家,小时候你就跟着武馆里的师父学了不少拳脚功夫。\n灵根是修仙的根本,对于大部分人来说,灵根自然是越精纯越好。你的灵根资质是...?\n=You were born in a family of martial arts practitioners in Ningzhou, and as a child you learned a lot of kung fu from your master in the martial arts school. Spiritual roots are fundamental to immortal cultivation, and for most people, the purer the roots, the better. Your spirit root qualification is...?\n

When I expect 2 entries. ie.

你出生在宁州的一个练武世家,小时候你就跟着武馆里的师父学了不少拳脚功夫。= \n灵根是修仙的根本,对于大部分人来说,灵根自然是越精纯越好。你的灵根资质是...?\n=

Over translating a few hundred lines, I have noticed many repetitions, and implementing it this way, it would be more human readable, and removing those repititions, especially when I add more splitters to handle skill formatting.

With enough recursion depth, it should be able to translate a paragraph of various sentences put together in different order.

ndrwln commented 2 years ago

I found out that the translation file flattens any new lines - but the original text is not when it is checked for regex??. I added some code to flatten sentences. Now I'm testing to see if the splitter regex works.

No wonder the code I added to write the split parts to disk didn't work.

gravydevsupreme commented 2 years ago

Whitespace/newline modifications are not fed into any regexes, so that is not the cause of your problem.

Note that newlines does not match (.+), meaning your shown example would never work.

Also there are three sentences (three 。) in the above line, which would mean there should be at least 3 lines, but that would require manually increasing the recursion limitations in the config file. But for that to work properly you probably need to add ^ and $ to your regex to indicate start and end. In that case I would recommend using ([\S\s]+) to catch the second part.

I am generally not sure this is a good use-case for splitting regexes, though.

gravydevsupreme commented 2 years ago

If you know the GameObject path to the text component you could specify that in GameLogTextPaths in the config file to make it split all texts on newlines automatically.

thos-grol commented 2 years ago

If you know the GameObject path to the text component you could specify that in GameLogTextPaths in the config file to make it split all texts on newlines automatically.

Took a break for a while. Coming back to this problem. Ill look into it.

thos-grol commented 2 years ago

If you know the GameObject path to the text component you could specify that in GameLogTextPaths in the config file to make it split all texts on newlines automatically.

Thanks, it works. I tried it.