Open slifty opened 2 months ago
These are my collected thoughts on internationalizing the service repo!
tldr; I think we want to track a user's preferred language via the Accept-Language header and use that to inform our translations. We would want to hardcode translation tables for errors(names, messages, descriptions) and for base fields from all supported languages to english, and otherwise keep functionality the same
Error Logs Any error that is output to the user(via an http endpoint) should be in their given language. This can and probably should be handled within the outgoing error response. Should internal errors also be internationalized?
Base field labels
These, to my understanding, are provided by the seed file src/database/seeds/0001-insert-base_fields.sql
. It seems like we ultimately want any user to be able to submit a proposal to any instance of a pdc service, in any language. This means we don't want to provide a bunch of alternative seed files in different languages and just say 'load in the seed file in the language you want when you set up the server,' as that would limit the instance to one language. Instead, we want to keep a ground-truth base field label list and then have some middleware that translates the fields in an uploaded proposal to their english equivalent. We can know which language to translate from based on a detail passed by the user. Perhaps this detail would be the Accept-Language header (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language)
Tests? If we are internationalizing the codebase then we would seemingly want to describe the tests in other languages, but that may be beyond the scope. Also, if we're only internationalizing the user experience, then we do not need to do this anyway
Comments Similarly, this is only relevant if we are internationalizing the codebase, which I don't think we are
Database field names ?? This seems like something we explicitly do not want to internationalize. I think we need to approach this in terms of handling data and translating it to english (more specifically, to it's english counterpart in our database) before it reaches our database.
File Names, specifically types Probably not, again unless we are internationalizing the repository
Data formatting, such as currency values and dates
This one is interesting! I would think that since we have base fields that accept a monetary amount, such as (Cost per outcome statement' , '', 'cost_per_outcome_statement', 'number', 'proposal')
, we would want to be able to present to the user their data as the appropriate currency. As far as I'm aware, we don't have currency type as a base field. Then again, just because a proposal was written in french doesn't mean that the monetary amounts they use are to be read as euros. Maybe we don't want to even consider this, but I do wonder if we want to add as a field to proposal entities the language that they were originally written in, to inform how they are displayed on the front-end, among other things.
After reflecting on what we actually can internationalize, I think that the most important aspects of the service to internationalize are user facing error logs (namely HTTP error code responses) and the base field labels. Again, I think the way to do this is to track user preferred language via Accept-Language header
What needs to be internationalized in an error? Our error handling middleware, as written, returns the error status code, name, message, and details:
res
.status(statusCode)
.contentType('application/json')
.send({
name: getNameForError(err),
message: getMessageForError(err),
details: getDetailsForError(err),
});
Thanks to the beauty of standards, we don't have to worry about the status code. We do have to worry about everything else. Names, for example, are grabbed from the error constructor, which is fairly nifty, and I think speaks to the point that we don't want to change the existing code to be more 'internationalized,' rather we want to add a layer of translation over things. So, I think we would want to add to each of the three getter functions logic to translate the output of their result based on the desired language (which again, I think we want specified via the Accept-Language header).
the flow would be something like
This seems safest since we can ensure there will always be a valid translation, as we will be controlling which content is translated on both ends. This becomes trickier when we deal with translating base fields
This is a much trickier issue, and I think the way to do this is: Much like how we have an official list of base fields we provide to the user in English, we have a list of base fields in all supported languages we provide to any given user. We are then able to once again control the form of the data coming in, and then using a lookup table, translate to English. If there are fields that don't yet exist in the database, we can add them as we would fields in English, but I am not sure then what the best approach is when having different fields that are simply the same field in different languages
@slifty @jasonaowen @bickelj I have collected my thoughts on internationalization here, no need to read the whole thing but wanted to draw attention to it before the check in on friday!
After a full-team meeting we have clarified our goals and path forward. Firstly, we are decidedly looking to localize only user-facing pdc data, not the codebase. This includes error messages, but our top priority is to internationalize the base-fields, as those are among the key functionality of the pdc.
The approach we have settled on is to refactor the existing base-field type, which only supports a singular value for it's content, to accommodate multiple localizations, which would be represented as a list of foreign keys in a new table, base_field_localizations. We would drop the label column from base_fields, and create the new table base_field_localizations with a foreign key on base_fields
where the base_field_localizations table looks like | id | base_field_id | language | localization |
---|---|---|---|---|
1 | 1 | en | organization name | |
2 | 1 | fr | nom de l'organisation | |
3 | 1 | sp | Nombre de la Organización |
A few design choices that came up in discussion regarding this plan:
Ultimately the user-experience is going to have to be localized, but as we are not emphasizing front-end development this phase it is not a top priority.
Good summary overall @hminsky2002!
Some notes on the details:
localizations
table should be called base_field_localizations
We don't need the localizations
column in the base fields table (this would violate third normal form). We already have that relationship stored via the base_field_localizations
table, so you can easily query the localizations associated with base fields using a JOIN base_field_localizations ON base_field_localizations.base_field_id = base_field.id
Yes, thanks for writing this up, @hminsky2002! I think you captured everything we talked about.
The only thing I'd add is that the new table doesn't need a proxy id
column; we already need a UNIQUE
constraint on (base_field_id, language)
, so we can just make that the PRIMARY KEY
!
This will include error responses, but also PDC-owned strings such as base field labels.