Open AntoineAugusti opened 2 years ago
Thanks for opening the issue !
Keeping just the error ids + English messages in the default report, but adding a community-maintained translation in the repo (e.g. a YAML/JSON/po files), and letting the developers translate at display time, could have the same type of usefulness without actual changes in the report (and less impact on the project).
It could be managed in this repo, or be a sort of "side project" outside the repo.
Just ideas at this point !
@AntoineAugusti @thbar I'd love to see support for multiple languages integrated directly into the validator and in a way that would support easy third-party contributions. I think it's definitely possible, it would just take some refactoring within the project and an agreement on how the translations would be stored.
As documented in https://github.com/MobilityData/gtfs-realtime-validator/tree/master/gtfs-realtime-validator-lib#output, right now the JSON output looks like this:
[ {
"errorMessage" : {
"messageId" : 0,
"gtfsRtFeedIterationModel" : null,
"validationRule" : {
"errorId" : "W001",
"severity" : "WARNING",
"title" : "timestamp not populated",
"errorDescription" : "Timestamps should be populated for all elements",
"occurrenceSuffix" : "does not have a timestamp"
},
"errorDetails" : null
},
"occurrenceList" : [ {
"occurrenceId" : 0,
"messageLogModel" : null,
"prefix" : "trip_id 277716"
}, {
"occurrenceId" : 0,
"messageLogModel" : null,
"prefix" : "trip_id 277767"
}, {
"occurrenceId" : 0,
"messageLogModel" : null,
"prefix" : "trip_id 277768"
},
In the above example, three trip_updates
have been validated, and each was missing a timestamp (warning W001
). To put together the full message for each occurrence of the warning or error, you add the occurrence prefix
to the validationRule occurrenceSuffix
.
For example, in UI format the above would look like:
trip_id 277716 does not have a timestamp
trip_id 277767 does not have a timestamp
trip_id 277768 does not have a timestamp
This is a relatively simple example where the prefix doesn't even need to be translated, and as long as you can create a suffix in the translated language that grammatically joins with the prefix you'd really only need a translated suffix.
All the suffixes are defined here as the last parameter passed into the constructor for each rule (other general rule descriptions are also configured via the same constructor): https://github.com/MobilityData/gtfs-realtime-validator/blob/master/gtfs-realtime-validator-lib/src/main/java/edu/usf/cutr/gtfsrtvalidator/lib/validation/ValidationRules.java
The prefixes are all currently defined where the rule is implemented in the code in the rules
package. For example, here are the timestamp prefixes:
https://github.com/MobilityData/gtfs-realtime-validator/blob/master/gtfs-realtime-validator-lib/src/main/java/edu/usf/cutr/gtfsrtvalidator/lib/validation/rules/TimestampValidator.java#L83
Some of those have more complex sentence structures where it would be harder to simply translate a prefix or suffix alone. For example, looking at E022 for "this stop arrival time is < previous stop arrival time", here's the prefix:
String prefix = id + stopDescription + " arrival_time " + arrivalTimeText + " (" + arrivalTime + ") is less than previous stop arrival_time " + previousArrivalTimeText + " (" + previousArrivalTime + ")";
...and the suffix is just "- times must increase between two sequential stops".
So as far as an implementation goes it would be a matter of pulling those values into key/value pairs and then defining a format for integrating the values into a translated string. All of my internationalization experience is on Android, but I think we could leverage the Java internationalization framework for this and store translations in .properties
files:
https://www.baeldung.com/java-resourcebundle
Using Java's framework would help automatically handle items like ,
instead of .
, default date formats, etc. and translation framework providers like Transifex should support it (see more on this below).
In terms of output format, is anyone aware of a standardized translation format for JSON response elements? I haven't done translations within JSON data before and couldn't easily find one. If one doesn't exist, we could mirror the GTFS Realtime Service Alerts format, which looks like this:
header_text {
# multiple languages/translations supported
translation {
text: "Stop at Elm street is closed, temporary stop at Oak street"
language: "en"
},
translation {
text: "L'arrĂȘt Ă la rue Elm est fermĂ©, l'arrĂȘt temporaire Ă la rue Oak"
language: "fr"
},
}
So an equivalent for this project would be something like:
{
"errorMessage" : {
"messageId" : 0,
"gtfsRtFeedIterationModel" : null,
"validationRule" : {
"errorId" : "W001",
"severity" : "WARNING",
"title" : [
{
text: "timestamp not populated",
language: "en"
},
{
text: "horodatage non renseigné",
language: "fr"
},
],
"errorDescription" : [
{
text: "Timestamps should be populated for all elements",
language: "en"
},
{
text: "Les horodatages doivent ĂȘtre renseignĂ©s pour tous les Ă©lĂ©ments",
language: "fr"
},
],
"occurrenceSuffix" : [
{
text: "does not have a timestamp",
language: "en"
},
{
text: "n'a pas d'horodatage",
language: "fr"
},
],
},
"errorDetails" : null
},
"occurrenceList" : [ {
"occurrenceId" : 0,
"messageLogModel" : null,
"prefix" : [
{
text: "trip_id 277716",
language: "en"
},
{
text: "trip_id 277716",
language: "fr"
}
]
}, {
This would obviously get more complicated if translations don't fit neatly into prefixes and suffixes - I think in that case we'd need to change the output format. But if we're targeting Western languages first I think keeping it close to the existing format as in the example above might work - but let me know if you start looking at the rules and find examples where the prefix/suffix format just wouldn't work.
We could also try to leverage an existing translation platform like Transifex (which is free for OSS) in coordination with the format we decide on: https://www.transifex.com/
I've used Transifex in context of two OSS Android projects and it simplifies communicating with translators and makes it easier for non-developers to contribute translations. And it looks like they support the Java .properties
file format:
https://docs.transifex.com/formats/java-properties
Any thoughts/ideas/improvements to the above?
We @ transport.data.gouv.fr would be interested to have error messages, descriptions and examples from the JSON report in other languages. I'll let you guess the language we are interested in đ
Would it be possible for the community to translate these things and specify the language we are interested in when validating data?
We would love to be able to have reports in multiple languages as well, to avoid running the validator multiple times if we are interested in multiple languages.
cc @fchabouis @thbar