freedomofpress / securedrop

GitHub repository for the SecureDrop whistleblower platform. Do not submit tips here!
https://securedrop.org/
Other
3.6k stars 685 forks source link

Investigate use of semgrep to catch untranslated strings #6380

Open eloquence opened 2 years ago

eloquence commented 2 years ago

https://github.com/freedomofpress/securedrop-client/pull/1272 added a set of handy semgrep rules to the securedrop-client repo to catch untranslated GUI strings. It'd be good to investigate if similar rules would be helpful in this repo, bearing in mind that the actual patterns would of course need to be different and not generate too many false positives.

cfm commented 2 years ago

6368 and #6465 both offer evidence for the value of this linting.

cfm commented 2 years ago

Why are these omissions so difficult to catch during manual testing in the string-freeze process? At that point in the localization cycle, strings not (or incorrectly) marked for translation are indistinguishable from strings not yet translated.

cfm commented 2 years ago

Time-boxed a cranky stab at this using 38c97bb4f6e1fe863b7c078784200a69c693e78f as my tricky target case. As I expected, regex is Semgrep's only view into our .html Jinja templates, and it's a challenging multi-line match given the nesting of HTML → Jinja → Python → HTML.

Targeting c33cbe412a4f16cd05dd64d5e078d56d3f0e0d8e would be an easier first iteration, to catch the basic one-line {{ gettext('foo') }} case. Note that we'll need to match on both ['"].

cfm commented 1 year ago

https://github.com/freedomofpress/securedrop/issues/6380#issuecomment-1135240576:

Why are these omissions so difficult to catch during manual testing in the string-freeze process? At that point in the localization cycle, strings not (or incorrectly) marked for translation are indistinguishable from strings not yet translated.

We could solve this problem at least for human eyes by turning on Weblate's "pseudolocale generation":

Pseudolocales are useful to find strings that are not prepared for localization. This is done by altering all translatable source strings to make it easy to spot unaltered strings when running the application in the pseudolocale language.

I'll bring this up next week when we revisit our localization roadmap for v2.6.0 and beyond.