WordPress / openverse

Openverse is a search engine for openly-licensed media. This monorepo includes all application code.
https://openverse.org
MIT License
240 stars 194 forks source link

Flatten i18n message keys in message JSON #4979

Open sarayourfriend opened 1 week ago

sarayourfriend commented 1 week ago

Problem

Our current vue/i18n messages files utilise nested JSON objects to discriminate keys. A sample from our en.json5 looks like this:

{
  "404": {
    title: "The content you’re looking for seems to have disappeared.",
    main: "Go to {link} or search for something similar from the field below.",
  },
  hero: {
    subtitle: "Explore more than 800 million creative works",
    description: "An extensive library of free stock photos, images, and audio, available for free use.",
    search: {
      placeholder: "Search for content",
    },
    disclaimer: {
      content: "All {openverse} content is under a {license} or is in the public domain.",
      /**
      Interpolated into hero.disclaimer.content:
       _{license}_ part of "All Openverse content is under a {license} or is in the public domain."
       */
      license: "Creative Commons license",
    },
  },
}

Code that references these keys does so in a dot-delimited path format. For example, hero.disclaimer.content references the key at that path in the nested object.

While nested objects may provide a slight advantage to authorship of the messages file, it presents a severe disadvantage when trying to find the message from a key in the code referencing it. Using the example above, if you wanted to see the message associated with the key hero.disclaimer.content, and tried to search the codebase for that literal string, you would not find it. Instead, you would have to know to navigate to the en.json5, and then find the hero key, find its disclaimer key, and then finally the content key. Sometimes you can shortcut this by searching for just the final part of the key, in this case content. However, we have 31 keys that end with the segment .content, so searching content: in the file would still require looking through individual instances to find it. The example above also uses relatively small and shallow nesting and does not illustrate the additional difficulty of navigating deeply nested keys in large collections of messages, like sensitive.designations.userReported.title.description.a.

Description

Instead, I propose we "flatten" the messages objects where the keys are the full path to the message. In other words, remove all nesting.

For example, the messages excerpt above would turn into this, instead:

{
  "404.title": "The content you’re looking for seems to have disappeared.",
  "404.main": "Go to {link} or search for something similar from the field below.",

  "hero.subtitle": "Explore more than 800 million creative works",
  "hero.description": "An extensive library of free stock photos, images, and audio, available for free use.",
  "hero.search.placeholder": "Search for content",
  "hero.disclaimer.content": "All {openverse} content is under a {license} or is in the public domain.",
  /**
  Interpolated into hero.disclaimer.content:
    _{license}_ part of "All Openverse content is under a {license} or is in the public domain."
    */
  "hero.disclaimer.license": "Creative Commons license",
}

This format is backwards compatible with our existing messages objects. If you replace the 404 and hero objects with the flattened version above and run your local frontend, there is zero issue.

The benefit of this approach is that it is easily searchable in both directions. From the messages file, it is easier to find uses of the keys. From runtime code, it is easier to find the content of the translation string. Additionally, our json-to-pot script could be simplified, as right now it has to collapse keys when converting to POT, because POT is a flat-format.

Our POT-to-JSON conversion re-explodes the keys into the nested object. We should retain this behaviour in the final output messages files. A meaningful downside to the flat format is an increase in the total character size of the keys, due to the repeated strings.

The nested example minifies to 509 characters (expand for minified version). ```json {"404":{"title":"The content you’re looking for seems to have disappeared.","main":"Go to {link} or search for something similar from the field below."},"hero":{"subtitle":"Explore more than 800 million creative works","description":"An extensive library of free stock photos, images, and audio, available for free use.","search":{"placeholder":"Search for content",},"disclaimer":{"content":"All {openverse} content is under a {license} or is in the public domain.","license":"Creative Commons license"}}} ```
The flattened example minifies to 527 characters (expand for minified version). ```json "404.title":"The content you’re looking for seems to have disappeared.","404.main":"Go to {link} or search for something similar from the field below.","hero.subtitle":"Explore more than 800 million creative works","hero.description":"An extensive library of free stock photos, images, and audio, available for free use.","hero.search.placeholder":"Search for content","hero.disclaimer.content":"All {openverse} content is under a {license} or is in the public domain.","hero.disclaimer.license":"Creative Commons license"} ```

Because this change is proposed to improve authorship (not transport or anything else relevant to the final generated files), and because it is backwards compatible with the nested format, we should retain the nested format for the produced JSON files.

To implement this change, we will need to flatten the en.json5's keys. This can be done by hand, or using something like this online JSON flattening tool, except that tool and others like it strip comments and make other unwanted transformations, so would still require manual changes to address those... Because json-to-pot already has to flatten the keys for use in the POT files, it shouldn't be too hard to adapt our existing json-to-pot script into a json-to-flattened-json script that preserves comments, single-quoted strings (which we use to avoid needing to escape double quotes in some strings), etc. I would recommend that approach, it seems like the least tedious option to me!

sarayourfriend commented 1 week ago

@WordPress/openverse-frontend what do y'all think of this proposed change?

zackkrida commented 1 week ago

I admittedly don't have a strong opinion about this change. I use a VS Code extension called i18n Ally that allows for many features, but the key ones are:

image

image

image


I did check and this plugin works fine with the flattened keys :+1:

sarayourfriend commented 1 week ago

That looks like a nice extension! I suppose for core maintainers using VSCode, that extension solves the issue of searching back and forth. Whether something similar exists for JetBrains IDEs (which I understand to be the other one some core maintainers use) I don't know. It doesn't solve the issue for new contributors, even if they use VSCode, who are less likely to have extensions like that one installed, or if you'd rather not need to install yet another extension just for finding things (and which may only be relevant to Openverse).

obulat commented 1 week ago

This is a great idea to help frontend contributors. It will also simplify the code to convert json to pot and vice versa.

I tried searching for problems that could be caused by having a period in a JSON key, and the only thing I could find is if we have both the "a.b" key ({"a.b": value1 }), and a nested object like {a: { b: value2 } }^1. So, it's important to only have the non-nested properties.