flattenthecurve / guide

https://www.flattenthecurve.com
Creative Commons Attribution 4.0 International
38 stars 33 forks source link

Md syntax compare #356

Closed rousik closed 4 years ago

rousik commented 4 years ago

This script allows comparing structural content (element tree) of the markdown files. This compares English (original) with the translated markdowns files, generates html and then produces diffs for any structural differences found.

Markdown files are transformed into html and parsed into element tree (using lxml). Documents are then traversed and turned into string representations /path/to/tag#attr1=value1#attr2=value2. Lists of these strings are then compared and diffed for each (english, translation) pair.

Flags for this script are:

  1. --langs [de,fr,...], restrict analysis only for given translations
  2. --include-regex, only analyze tags where the string representation matches given regex (e.g. "/h.#" to compare headers)
  3. --exclude-regex, skip tags where string representation matches given regex (e.g. "/p#" to skip over paragraphs)
  4. --show-tag-summaries, summarizes how often did certain tags differ between original and translation (this can be used to find out which tags we should focus on).
  5. --hide-diffs if you want to hide per-file html tag diffs.

This script can be used to find issues with markdown styling in translations and detect files where style has drifted between the original content and translations. The underlying issue is described in #354

github-actions[bot] commented 4 years ago

This pull request is being automatically deployed with now-deployment

Built with commit 3c9d259070add5748f02caf1535a088152eab3c6

✅ Preview: https://guide-preview-2ftn8934r.now.sh

rousik commented 4 years ago

Noted. I think it might make sense to document how to use this script on our wiki page (along with instructions on how to install dependencies). If it will be built into action, this will come with the dependency installation too.

rousik commented 4 years ago

Seems like using docker is the standard way to wrap things in this project so I will investigate if I could use that.