LibreRouterOrg / docs

Documentation, proposals, and /etc
5 stars 5 forks source link

convert PO to JSON - macro or script with regex match including newline in dot character #29

Closed patogit closed 5 years ago

patogit commented 5 years ago

converting PO to JSON in order to update translation files to our new format and workflow, and looking for help to automate the process.

In order to convert easily, we need to do a search and replace sequence that includes mutli-line patterns with multi-line wildcards. In the Bluefish editor, there's an option for "dot character in regex pattern matches newlines", so I made two demo files with Bluefish, but it can't record macros, which would make this work faster.

original format: https://github.com/translation-bridge/lime-docs/tree/master/Booklet-01-Networks

target format: https://github.com/patogit/lime-docs/tree/testing-JSON/Booklet-01-Networks/json

The seven search and replace steps for PO files generated by translatewiki.net are thus (where I mean a newline character, I write newline. Where I mean "replace the characters \n in the text" I write \n):

1) (Regex) msgid.*?msgstr -> newline
2) (Regex) "newline#: 01.en.txt:.*?newline " -> "newline"
3) (Regex) # Translation of LibreMesh.?Language: -> {"
4) (Regex) \\n"newline"X-Generator.? plural=(n > 1);\\n" -> ": {
5) newline " -> ,newline"booklet-01-paragraph-000
6) )\n -> ": "
7) \n -> space \n space

I've tried po2json, json2po, and https://localise.biz/free/converter/po-to-json . The latter at least produces a file, but still requires multiple search and replace steps, so I may as well just figure this out without a converter.

Any hints about how to automate this, very welcome.

patogit commented 5 years ago

https://askubuntu.com/questions/20414/find-and-replace-text-within-a-file-using-commands and https://stackoverflow.com/a/45981809 offer some guidance

patogit commented 5 years ago

Success with jEdit!! The resulting macro is at https://github.com/patogit/lime-docs/blob/testing-JSON/macros/po-2-json-transwikinet.bsh

It does all the automated search and replace that I could figure out. In the resulting file, I have to add the final brackets and add numbers to untranslated items.

1. \# Translation of LibreMesh(.|\n)*?Language:  -> {"
2. \\n"\n"X-Generator.*?(.|\n)*?\n#: -> ": {\n#:
3. msgid(.|\n)*?msgstr -> \n
4. #: 0[1-9].en.txt:.*?\n\n -> \n
5. (\n){3} -> \n
6. \n " -> ,\n"booklet-01-paragraph-000
7. \)\\n -> ": "
8. \\n ->  $0 

Hooray for powerful text editors!