flattenthecurve / guide

https://www.flattenthecurve.com
Creative Commons Attribution 4.0 International
38 stars 33 forks source link

Automatic detection of style drifts and translation mismatches #354

Open rousik opened 4 years ago

rousik commented 4 years ago

When we do style changes or rework the layout of the site it can happen that the translated content drifts from the head. There may be different header depths, some css styling may not be applied. I have also encountered german translations where markdown markers were omitted entirely.

While we can probably catch some of these things manually it might be worth looking into whether we can detect structural differences between EN and translated content so that we can flag these strings for verification/correction.

I have looked into whether syntactic/structural analysis of markdown can be done but haven't found anything promising. Given that md files are converted into html before rendering, we might look into comparing the element trees of the rendered markdown fragments.

I know Lokalise has some features to highlight structural inconsistencies but it seems that it only captures html tags and doesn't fully take markdown syntax into account.

rousik commented 4 years ago

@emersonthis and @nditada for visibility

rousik commented 4 years ago

Prototype of this script has been pushed to md-syntax-compare branch: https://github.com/flattenthecurve/guide/blob/md-syntax-compare/_scripts/compare_md_structure.py

The next steps are:

  1. Run this script and figure out what is the right choice of tags to exclude/include
  2. Identify structural differences in translation files that need to be fixed
  3. See if this script could be run on language-preview branches to identify problems early
rousik commented 4 years ago

I have been using this script to spot check markdown issues in newly launched languages and it seems to be helpful. I will need to document the best practices around using it in our processes and, ideally, see if we could also automatically integrate it with language PRs.