fkirc / attranslate

A command line tool for translating JSON, YAML, CSV, ARB, XML (via a CLI)
https://www.npmjs.com/package/attranslate
Other
336 stars 27 forks source link

Arrays are converted to objects #204

Closed dwknippers closed 2 years ago

dwknippers commented 2 years ago

Describe the bug Currently when translating JSON files from source to target, it converts any arrays inside the source JSON files to object representations of such like so from:

"address": {
    "lines": [
        "Line 1",
        "Line 2",
        "Line 3"
    ]
},

to

"address": {
  "lines": {
    "0": "Line 1",
    "1": "Line 2",
    "2": "Line 3"
  }
},

To Reproduce

  1. Create a new source JSON file containing an array
  2. Convert the JSON file to another language

Expected behavior The expected behavior would be that the datatype stays the same.

Files This could be done using any simple JSON file containing the above example content.

Additional context N/A

fkirc commented 2 years ago

Thank you for reporting this feature-request with a detailed description. Due to time-constraints and complexity, I am unable to implement the request right now. Nevertheless, I will leave the feature-request up-for-grabs to be implemented by the community. The relevant code includes https://github.com/fkirc/attranslate/blob/master/src/util/flatten.ts, but it might also span over multiple files.

fkirc commented 2 years ago

One way to implement it would be a postprocessing of JSON-files. So instead of trying to rewrite this unflatten-function, we might add an additional postprocessing-step that replaces "arrayobjects" with arrays.

dwknippers commented 2 years ago

Hey thanks for replying to my issue 💪,

Initially I've just parked this issue here so it's noticeable for the future, as I'm meaning to take a look at it myself later.

Personally not a great fan of post-processing, because it would introduce another layer of complexity for not much gain. So I think it's best to tackle this problem head-on in the parser.

Are there any roadblocks 🚸 you can think of while changing the relevant parser logic? Otherwise I'll crack on with researching the specifics later on when I have time.

Thanks for creating this project, its been the first tool that really worked well. Much love. 💙

fkirc commented 2 years ago

Thank you for participating in attranslate development 💪 In theory, either postprocessing or unflattening could be changed. But one thing to consider is that attranslate converts all JSONs into a flat-object-intermediate-representation, which means that the flow goes like that:

Read JSON -> Convert into flat object -> Do translations -> Unflatten Object -> Postprocess if necessary -> Write JSON

During this process, nestings get replaced with dot-strings and similar stuff, which means that structure-information might be lost. If we are not careful when parsing those dot-strings, then we might run into problems with specific characters that have special internal meanings.

fkirc commented 2 years ago

To prevent such losses of structural information during translation, some file-formats preserve structural details of the source file in memory. This happens for example for YAML or XML files. So in theory we could expand the nested-JSON-format to store more source-information, without touching any of the file-generic code.

dwknippers commented 2 years ago

Note: I haven't forgotten about this issue, I've just been very busy.

Initial research showed implementation might be more involved than anticipated, should have some time in about a week or so.

fkirc commented 2 years ago

I am already thinking about adding a new file-format “json-cloned“ in addition to the already existing JSON-formats. This might enable a much simpler implementation, namely a single recursive function over the source-JSON to replace translatable strings.

This is different to the current implementation which tries to recreate JSON-structure from scratch based off a flat translation-map.

dwknippers commented 2 years ago

I was thinking the same, a recursive implementation would make much more sense in this case, as context information is lost in translation in flat translation map structure.

The other solution would be to hold state about the context of keys, but that seems kinda messy and tricky to implement in the current structure.

Something more ideal would be to remove the extra steps all together, and move the translation step inside the format-specific implementation. So instead of translating all formats to a single map, translating it, then using that map to reconstruct the format, the format-specific implementation could trigger translation "in the loop".

And for batching, it could be implemented simply using a promise-like structure, either in multiple or key based.

What was the original intention behind the flat translation map structure?

fkirc commented 2 years ago

I agree with your conclusions, we should probably replace the entire nested-json implementation with a recursive implementation, similar to the recursive YAML-implementation which is already working well.

The original intention behind the flat translation map was to (1.) provide a unified interface for all file-formats and (2.) enable cross-format-translations. I think that this interface is still working because the individual file-formats are free to cache the entire source-file. In other words, the source-JSON-file can remain in memory for a recursive string-replacement-procedure.

fkirc commented 2 years ago

So there is one special-case where the flat array is still needed, and this case is „cross-format-translations“ where the source is not a JSON but something completely different.

But anyways, cross-format yields some implementation-dependant result, and if an issue comes up with cross-format, then we could still handle it with a separate PR.

dwknippers commented 2 years ago

Yeah I thought of that later on, need some more thought.

If cross-format translation is used, intermediate format is required. The structure of this format is the subject.

fkirc commented 2 years ago

My personal usage of crossformat translations was only for Android-to-iOS-translations, and I never tested all combinations. So there might be some format-combinations that do not work correctly. Nevertheless, the general contract was that „nestings“ are marked with a dot “.“ in a flatarray, whatever „nesting“ means in a particular format.

fkirc commented 2 years ago

I published 1.7.0 to rewrite the nested-JSON-processing as discussed, using a recursive string-replacement-algorithm. Please re-open this issue if a problem still persists with 1.7.0.

dwknippers commented 2 years ago

Hey thanks so much, seems to be a much more robust implementation! 🌟 Sorry I couldn't spend much time on this, but the PR is solid work!