Shopify / polaris

Shopify’s design system to help us work together to build a great experience for all of our merchants.
https://polaris.shopify.com
Other
5.76k stars 1.17k forks source link

[RFC][i18n] Pluralization for non-English locales #4093

Closed lhoffbeck closed 2 years ago

lhoffbeck commented 3 years ago

Summary

When components require that a consumer pass a pluralized version of a string, Polaris currently restricts the consumer to a English-centric pluralization format. This isn't very inclusive and can reduce merchant trust in our UI for non-English locales.

Existing issues:

Goal

Hopefully this RFC helps build context, but I'm planning to keep this pretty surface-level for now--just wanted to get a gut check on approach and see if this is something that has legs.

So... what's the issue?

We can look at IndexTable as a specific example. The component expects 2 resourceName values:

// index-provider/context.ts

export interface IndexContextType {
 ...
  resourceName: {
    singular: string;
    plural: string;
  };
  ...
}

The consumer then provides this value when using the component:

<IndexTable
  ...
  resourceName={{
    singular: 'Cactus',
    plural: 'Cactusses',
  }}
  ...
/>

Within IndexTable, these values are then used in a few contexts:

(1) Singular: refers to 1 of the resource item (*note: this is buggy because not all languages have a concept of singular)

// IndexTable/.../Checkbox.tsx:42

<PolarisCheckbox
  id={itemId}
  label={i18n.translate('Polaris.IndexTable.selectItem', {
    resourceName: resourceName.singular,
  })}
  labelHidden
  checked={selected}
/>

(2) Specific plural: associated with an item count (*note: this is buggy because not all languages have a single plural value)

// IndexTable.tsx:676

function getPaginatedSelectAllAction() {
  ...

  const actionText =
    selectedItemsCount === SELECT_ALL_ITEMS
      ? i18n.translate('Polaris.IndexTable.undo')
      : i18n.translate('Polaris.IndexTable.selectAllItems', {
          itemsLength: itemCount,
          resourceNamePlural: resourceName.plural.toLocaleLowerCase(),
        });

  return {
    content: actionText,
    onAction: handleSelectAllItemsInStore,
  };
}

(3) Generic plural: not associated with any item count

// IndexTable.tsx:364

...
<div className={styles.LoadingPanel}>
  <div className={styles.LoadingPanelRow}>
    <Spinner size="small" />
    <span className={styles.LoadingPanelText}>
      {i18n.translate(
        'Polaris.IndexTable.resourceLoadingAccessibilityLabel',
        {
          resourceNamePlural: resourceName.plural.toLocaleLowerCase(),
        },
      )}
    </span>
  </div>
</div>
...

The Issue

(stolen from @movermeyer's great polaris-react issue #4031)

Other languages use different pluralization forms depending on very complicated rules. Hard-coding singular and plural as the options for ResourceList hard-codes the English pluralization rules, and means that it is incorrect for any language other than English.

While we only handle 2 pluralization cases, the Unicode CLDR rules give us 6 different cases we'd need to account for in order to handle this correctly for all locales. This Pluralization for JavaScript article is a fantastic resource for further reading, but the relevant information for our case is:

CLDR defines up to six different plural forms. Each form is assigned a name: zero, one, two, few, many, or other. Not all locales need every form; remember, English only has two: one and other. The name of each form is based on its corresponding plural rule. Here is a CLDR example for the Polish language—a slightly altered version of our earlier counter rules:

  • If the counter has the integer value of 1, use the plural form one.
  • If the counter has a value that ends in 2–4, excluding 12–14, use the plural form few.
  • If the counter is not 1 and has a value that ends in either 0 or 1, or the counter ends in 5–9, or the counter ends in 12–14, use the plural form many.
  • If the counter has any other value than the above, use the plural form other.

Effectively, our copy may be completely wrong in other locales. As an example under our current implementation, in Arabic we may have:

- 1 book: كتاب  (👍 singular, we handle this)
- 100 books: ١٠٠ (👍 plural, we handle this)
- 0 books: ٠ كتاب (🛑 we don't handle this)
- 3 books: ٣ كتب  (🛑 we show ١٠٠)
- 11 books: ١١ كتابًا  (🛑 we show ١٠٠)

Issue Scope

Other polaris components that may have this same issue are:

Approaches

None of these go too deep, but wanted to suggest a few alternate approaches.

Note that Approach 1 & 2 require (1) that we add a new JS package that handles CLDR logic, and (2) that the user add the current locale to the Polaris scope, probably via <AppProvider> (my understanding is that we don't currently keep this state in polaris anywhere). However, I think this might be okay for a few reasons:

Approach 3 is slightly more painful for consumers to implement, but doesn't require us to make any major changes in the current behavior of Polaris.

Approach 1: MessageFormat + client passes a single format string

This approach uses the MessageFormat utility from the OpenJS Foundation to parse a formatstring based on locale and a provided count. Message format uses the ICU message format standard to provide pluralization rules in a single format string.

Prerequisites

This is how this could work in our IndexTable example:

// index-provider/context.ts

export interface IndexContextType {
 ...
  // This format string should use the [ICU message format standard](http://userguide.icu-project.org/formatparse/messages)
  resourceNameFormatString: string;
  ...
}

The consumer provides this value when using the component:

<IndexTable
  ...
  resourceNameFormatString='{COUNT, plural, one {Cactus} few {Cactii} other {Cactusses}}'
  ...
/>

Usage then looks like:

(1) Singular: refers to 1 of the resource item

// IndexTable/.../Checkbox.tsx:42

<PolarisCheckbox
  ...
  label={pluralize(resourceNameFormatString, 1)}
  ...
/>

(2) Specific plural: associated with an item count THIS IS POTENTIALLY WRONG FOR NON-ENGLISH LOCALES

// IndexTable.tsx:676

function getPaginatedSelectAllAction() {
  ...

  const actionText =
    selectedItemsCount === SELECT_ALL_ITEMS
      ? i18n.translate('Polaris.IndexTable.undo')
      : i18n.translate('Polaris.IndexTable.selectAllItems', {
          resourceNamePlural: pluralize(resourceNameFormatString, itemCount),
        });

  ...
}

(3) Generic plural: not associated with any item count

// IndexTable.tsx:364

...
<div className={styles.LoadingPanel}>
  ...
    {i18n.translate(
      'Polaris.IndexTable.resourceLoadingAccessibilityLabel',
      {
        resourceNamePlural: pluralize(resourceNameFormatString, PAGE_SIZE),
      },
    )}
  ...
</div>
...

Pros:

Cons:

Approach 2: Use Globalize to determine correct plural format

This approach uses the pluralGenerator utility from the GlobalizeJS package to programmatically determine which plural form (zero, one, two, few, many, or other) should be used based on a count and the current locale.

Prerequisites

This is how this could work in our IndexTable example:

// index-provider/context.ts

export interface IndexContextType {
 ...
    resourceName: {
      one: string;
      other: string;
      zero?: string;
      two?: string;
      few?: string;
      many?: string;
    };
  ...
}

The consumer provides this value when using the component:

<IndexTable
  ...
  resourceName={{
    one: 'Cactus',
    few: 'Cactii',
    other: 'Cactusses',
  }}
  ...
/>

Usage then looks like:

(1) Singular: refers to 1 of the resource item

// IndexTable/.../Checkbox.tsx:42

<PolarisCheckbox
  ...
    label={i18n.translate('Polaris.IndexTable.selectItem', {
      resourceName: resourceName[getPluralizationType(1)], // (alternately, getPluralization(resourceMap, count))
    })} 
  ...
/>

(2) Specific plural: associated with an item count

// IndexTable.tsx:676

function getPaginatedSelectAllAction() {
  ...

  const actionText =
    selectedItemsCount === SELECT_ALL_ITEMS
      ? i18n.translate('Polaris.IndexTable.undo')
      : i18n.translate('Polaris.IndexTable.selectAllItems', {
          resourceNamePlural: resourceName[getPluralizationType(itemCount)] || resourceName.other,
        });
  ...
}

(3) Generic plural: not associated with any item count

// IndexTable.tsx:364

...
<div className={styles.LoadingPanel}>
  ...
    {i18n.translate(
      'Polaris.IndexTable.resourceLoadingAccessibilityLabel',
      {
        resourceNamePlural: resourceName[getPluralizationType(PAGE_SIZE)] || resourceName.other,
      },
    )}
  ...
</div>
...

Pros:

Cons:

Approach 3: Pluralize via injected callback

In this approach, the client passes a callback function that accepts a count variable and returns the correct translation.

Prerequisites (none)

This is how this could work in our IndexTable example:

// index-provider/context.ts

export interface IndexContextType {
  ...
  pluralizeResourceName(count: number): string;
  ...
}

The consumer provides this value when using the component:

<IndexTable
  ...
  pluralizeResourceName={(count: number) => {
    // IF the consumer is rolling translations on the fly:
    if (count === 1) {
      return "Cactus";
    } else if (count === 2) {
      return "Cactii";
    } else {
      return "Cactusses";
    }

    // ELSE IF a robust i18n system is present (like the `react-i18n` package that @shopify/web uses)
    return i18n.translate('cactus', {count});
  }}
  ...
/>

Usage then looks like:

(1) Singular: refers to 1 of the resource item

// IndexTable/.../Checkbox.tsx:42

<PolarisCheckbox
  ...
    label={i18n.translate('Polaris.IndexTable.selectItem', {
      resourceName: pluralizeResourceName(1)
    })} 
  ...
/>

(2) Specific plural: associated with an item count

// IndexTable.tsx:676

function getPaginatedSelectAllAction() {
  ...

  const actionText =
    selectedItemsCount === SELECT_ALL_ITEMS
      ? i18n.translate('Polaris.IndexTable.undo')
      : i18n.translate('Polaris.IndexTable.selectAllItems', {
          resourceNamePlural: pluralizeResourceName(itemCount),
        });

  ...
}

(3) Generic plural: not associated with any item count

// IndexTable.tsx:364

...
<div className={styles.LoadingPanel}>
  ...
    {i18n.translate(
      'Polaris.IndexTable.resourceLoadingAccessibilityLabel',
      {
        resourceNamePlural: pluralizeResourceName(PAGE_SIZE),
      },
    )}
  ...
</div>
...

Pros:

Cons:

Conclusion

My hope is that these findings take work off someone else's plate :) My personal feeling is that to fix current issues and as we support more locales/languages and as Polaris grows, we're going to need to make the system locale-aware. Given that, of the 3 approaches, option 1 or 2 may be a better long-term fit.

lhoffbeck commented 3 years ago

@movermeyer tagging you for visibility, I came across your two issues while looking into this :)

movermeyer commented 3 years ago

@lhoffbeck

Keep in mind that I'm not a FED, nor do I know much about Polaris. but I do know pluralization fairly well.

Singular vs Specific plural vs Generic plural

"Singular" is only an English (and a few other languages) concept.

There should not be any difference in the calling code for any of these cases. You always have to pass in the count, regardless of whether the count is 1 or not.

Framing the discussion around these three cases feels strange and suggests that something is wrong. I can't be 100% sure though, since I'm not sure I understand why these cases are being discussed in the first place.

THIS IS POTENTIALLY WRONG FOR NON-ENGLISH LOCALES is sprinkled throughout, but they are all potentially wrong for non-English locales. So I'm not sure why it is only being applied to some cases and not others.

I'm not sure I understand the difference between "Specific plural" and "Generic plural". Is it just whether or not you are using the resource name as an interpolation?

Interpolation

i18n.translate('Polaris.IndexTable.selectAllItems', {
  resourceNamePlural: pluralize(resourceNameFormatString, itemCount),
});

The entire string should be translated into the language's required plural forms rather than using an interpolation of the resource name. The rest of the sentence's grammar can depend on:

rafal-nedzarek-loc commented 3 years ago

@lhoffbeck I agree with Michael on this one:

The entire string should be translated into the language's required plural forms rather than using an interpolation of the resource name.

Interpolating resourceName causes problems for other languages, especially causing missing articles or grammatical gender mismatch.

This is compounded by the liberal use of articles in the English UI. E.g. Add menu item may be fine for the English-speaking audience (although AFAIK it's not grammatically correct), but Ajouter élément de menu (missing the un article) is not passable in French .

That particular string has an interesting history:

The above is just to add more context to this discussion. Really happy to see someone from outside our team actually giving a dime about this problem! Much appreciated :) We're always happy to discuss i18n stuff in help-i18n-and-translation, especially if this will help us gain more traction on this issue.

lhoffbeck commented 3 years ago

@movermeyer / @rafal-nedzarek-loc thank you both for your replies! Sorry for the delayed response, completely missed the notification on this 😅

Singular vs Specific plural vs Generic plural ... You always have to pass in the count, regardless of whether the count is 1 or not. ... Framing the discussion around these three cases feels strange and suggests that something is wrong. I can't be 100% sure though, since I'm not sure I understand why these cases are being discussed in the first place.

100% agree, in all cases the solution for interpolation is that we'd need to use a count value to translate properly. The reason I framed it as singular vs specific plural vs generic plural is because passing resourceName is a pretty common pattern in polaris-react and those are the 3 ways polaris-react components consume them--they either assume a strict singular value is correct (singular is not a universal concept so this is buggy), use a strict, single plural value when an item count exists (also buggy since it doesn't pass a count so it misses two, few, many cases), or use a generic plural value not associated with a count (this is effectively the other plural case).

Definitely not saying singular/specific/generic is a GOOD way to refer to pluralization cases, just wanted to be able to describe the current use-cases succinctly :sweat_smile:

THIS IS POTENTIALLY WRONG FOR NON-ENGLISH LOCALES is sprinkled throughout

This was a bad copy-paste, updated the description for clarity. You're right, with interpolation there's still the chance that sentences are wrong.

The entire string should be translated into the language's required plural forms rather than using an interpolation of the resource name.

As an ideal, totally agree. This would also help with capitalization issues, polaris sometimes assumes things should be toLocaleLowerCase which can be wrong for languages like German that capitalize nouns.

I focused on improving the interpolation pattern because it's what's baked into polaris-react currently, and gets us to a closer (but not perfect) solution for a number of languages. It looks like any of the suggested approaches could help with Rafal's example of add menu item since having more pluralization granularity allows us to be more specific with articles, although we'd still run into problems here if the sentence structure changes dramatically based on pluralization.

Depending on how it's set up, changing to an approach where the entire string is translated increases the footprint of a component's props or usage. I'm not sure what the appetite is for that within the polaris team or for consumers given how English-centric so much of Shopify is... as we continue to grow internationally the impact of not doing something like this will become greater though.

Super happy to jam with either/both of you guys on this and see if we can kick this thing further down the road!