Localisation (l10n) - Githubissues

Provide guidance on localising a Thing Description.

Consider adding a lang member and using HTTP Accept-Language headers to request Thing Descriptions in different languages.

See https://github.com/w3c/wot/issues/373

FWIW - this seems like another step away from the W3C Thing Description work. In section 2.1, it seems clear that the Thing Description is intended for machine-to-machine communication, rather than building user interfaces:

The Working Group will develop solutions to describe Things through metadata and declarations of their capabilities (e.g., possible interactions). This work includes the definition of different machine-understandable vocabulary sets as well as serialization formats of such a Thing Description. The Thing Description will be aimed at enabling scalable and automated tooling

Making each device responsible for providing localization data further nudges the scope of the Thing Description to a UI specification language while increasing the burden on every resource-constrained Thing deployment. It seems unnecessary, for example, to require a light bulb that conforms to OnOffSwitch to carry deliver strings for Quechua to be used naively in that language. Any app, web page, or device that is already localized for that language and is aware of OnOffSwiitch should be able to present a localized user interface based on the "machine-understandable" information provided.

There are a small number of human-readable strings in the Thing Description, including the device's name and description as well as titles and descriptions for properties, actions and events.

Wherever human-readable strings exist, they need to be localisable.

The current W3C specification suggests a MultiLanguage container which lists all of these localised strings inside the Thing Description itself, which could make for a very long Thing Description.

My proposal here is to follow the example of W3C Web App Manifests (also a machine-readable resource with a JSON encoding which includes a small number of human-readable strings) and to use HTTP content negotiation so that a client can request a Thing Description in a particular language, rather than include them all in one big resource.

Machine-readable terms which are part of a schema like OnOffSwitch do not need to be localised.

Human-readable strings like descriptions do need to be localised and I don't think it's really feasible for them to be generated automatically based on a schema because they are likely to be unique to each device.

@benfrancis Just for information: The W3C Thing Description supports content negotiation based on client's language preferences. Also see the definition (yellowed highlighted) in MultiLanguage. That means there is not a need to use the MultiLanguage container.

Wherever human-readable strings exist, they need to be localisable

No argument

Machine-readable terms which are part of a schema like OnOffSwitch do not need to be localised.

Agreed. An example in the Web Thing Description draft includes Light and OnOffSwitch schemas. It contains these human readable strings which would require localization:

On/Off
Whether the lamp is turned on
Brightness
The level of light from 0-100
Fade
Fade the lamp to a given level
A web connected lamp
Overheated
The lamp has exceeded its safe operating temperature

The question I have in mind is whether there should many, if any, human-readable strings in the Thing Description. That stems from an understanding that the focus is machine-to-machine communication. Obliging the Thing to provide all user interface labels is a broader scope increases the skills and effort require to deploy a Thing.

Should the vendor of a lightbulb be required to provide localized strings in the firmware for every possible language where that bulb will be deployed? There are very real costs associated with the requirement:

Generating those correctly is non-trivial. Most companies building such products will be obliged to outsource that at some expense.
The localization data increases the size of the firmware. Often in an embedded device the ROM / flash storage does not have much free space, as that would be an unnecessary expense.
If the entire localization table is to be provided (e.g. MultiLanguage), the transfer time of the thing description is increased. If content negotiation is used, it adds implementation complexity fo the device.
Updating the firmware of a Thing is possible but difficult, and consequently expensive. Updates may now be required to correct localization errors and two add new language support.
For machine-to-machine scenarios (e.g. a light bulb and light switch communicating directly) there is no need for the human-readable strings. They only add overhead to resource constrained devices.

The W3C Thing Description does not appear to support schemas, unlike Mozilla WebThings (forgive me if I have gotten the precise names wrong here). In the absence of schemas, some human readable strings are necessary to build a human interface. Schemas, such as OnOffSwitch encapsulate a well defined, common functionality. The absence of a schema in the Thing Description is expedient for prototyping, but ultimately commercial products are likely fall into one or more schemas for their core functions.

The presence of human readable strings in the Thing description pulls in a large number of issues that increase the complexity of deploying compatible device and implementing compatible clients for those devices. The schema mechanism, in principle, allows these human readable strings which require localization to be largely, perhaps entirely, eliminated. This is because the client can take sole responsibility for providing localized strings for the well known device schemas it supports. As defined now, there is extra work for both devices and clients and, complex problems remain even after that work is done.

Consider a mobile app which wants to provide a remote user interface for the OnOffSwitch devices in the home. This app has a focused purpose, in contrast to the Mozilla gateway which is designed to support all Things, whether or not they conform to a schema. The app will be localized to the set of languages its author selects. Is the app required to use the localized strings from the device? If yes, does that mean a switch with only French strings cannot work with an app that has only English support. If no, does the manufacturer of the switch expect this? If the app has its own terminology for On/Off, is it acceptable to override the values from the switch? Further, in a home with lights from several vendors, the labels for identical operations may be different.

Apologies for the length of this comment. Perhaps I have misunderstood the scope, and that a design goal of the Thing Description is to facilitate the building of general purpose user interfaces to interact with unknown devices. If that's the case, it might be made more clear by the specifications, which only minimally address how these strings will be used in a client user interface. Of more concern to me, it seems to raise the bar on the hardware and organizations which can successfully deploy devices using this proposed standard.

@phoddie

The W3C Thing Description does not appear to support schemas, unlike Mozilla WebThings (forgive me if I have gotten the precise names wrong here).

Actually, this is one of the strengths of the W3C Thing Description (TD) to support such kind of schemes depending of application context. In general, the W3C TD is designed in that way that it can be applyed domain independent and can be used for any kind of IoT scenarios such as Thing2Cloud, Thing2Thing, Thing2Browser, etc.
Being machine interpretable by the standardized linked data concept of JSON-LD 1.1 a TD can easily be extended, e.g., by existing capability schemas such as from iot.schema.org or from https://iot.mozilla.org/schemas/ .

Consider a mobile app which wants to provide a remote user interface for the OnOffSwitch devices in the home.

The question is how intelligent such a mobile app should be. From my point of view the minimum that should be supported is to browse the provided properties, actions and events through a Thing (e.g. onOff, dade, overheating). If more information such as title and description is given (in the W3C TD specification these are optional terms), the app can use this information to create a better context for humans and have a better label text ('onOff' vs.'On / Off Switch'). The same applies to languages. If there is a preferred setting for a language, why not use it if it is also offered within the TD or requested by the content negotiation. If there is no support for the preferred language, then there may be a fall-back to the minimum requirement, as I mentioned earlier.

Of more concern to me, it seems to raise the bar on the hardware and organizations which can successfully deploy devices using this proposed standard.

Note that a TD does not necessarily have to be located directly on the Thing. If the resource requirement is tough, the TD can be managed elsewhere and the Thing will only give a hint where the TD can be requested.

@sebastiankb, thank you for the comments.

I understand from your notes that W3C Thing Description does not define schemas but can accommodate those defined elsewhere. The Mozilla defined schemas are an example of this. A common challenge with very general solutions, and that is what W3C Thing Description appears to be, is applying them to specific scenarios. The flexibility in the design has the potential to work against interoperability when implementations make different choices.

The question is how intelligent such a mobile app should be. From my point of view the minimum that should be supported is to browse the provided properties, actions and events through a Thing (e.g. onOff, dade, overheating).

This is a very specific assumption about the kind of user experience to be provided for a W3C Thing. It assumes the user will experience the device as the collection of properties, events, and actions enumerated in the Thing Description rather than as a particular product. This is effectively an inspector interface, such as the Mozilla IoT Gateway.

The names of properties, actions, and events are not normative. They are machine readable identifiers guaranteed to be unique within the section of the Thing Description they are contained in. The mobile application cannot assume anything from "onOff" or "fade". They could be replaced with a single letter, Simplified Chinese name, or emoji glyphs and remain valid. It is quite general.

The identifiers only acquire meaning for machine-to-machine communication with the addition of optional schemas.

...same applies to languages. If there is a preferred setting for a language, why not use it if it is also offered within the TD or requested by the content negotiation

The label may be too long to fit the space available. The label may include Unicode characters which cannot be rendered by the device presenting the user interface. The user interface design may be constructed around different terminology.

We are now some distance from Localization, where this topic began. My fundamental point remains - that, from my experience, W3C Thing Description does not seem well suited to many of devices that make up IoT today and for the foreseeable future. Perhaps that is deliberate, which is not a problem if the specification is oriented towards a more distant the future. However, some of the work around Web of Things, including the Arduino implementation by Mozilla, have led me to understand that the goals include supporting such devices.

WebThingsIO / api

Localisation (l10n) #127