freedomofpress / securedrop-https-everywhere-ruleset

HTTPS Everywhere ruleset for human-readable Onion URLs for SecureDrop instances
https://securedrop.org/https-everywhere/
10 stars 3 forks source link

https-everywhere's merge-rulesets.py fails if org name includes double quotes #92

Open zenmonkeykstop opened 2 years ago

zenmonkeykstop commented 2 years ago

If the organization name for an instance includes double quotes, they will be included without escaping in the reuleset XML, breaking HTTPSE's scripts with errors like:

~/securedrop-https-everywhere-ruleset/https-everywhere ~/securedrop-https-everywhere-ruleset
 * Parsing XML ruleset and constructing JSON library...
Traceback (most recent call last):
  File "utils/merge-rulesets.py", line 50, in <module>
    tree = xml.etree.ElementTree.parse(filename)
  File "/usr/lib/python3.7/xml/etree/ElementTree.py", line 1197, in parse
    tree.parse(source, parser)
  File "/usr/lib/python3.7/xml/etree/ElementTree.py", line 598, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 24

One workaround is to just not do that, but it would be good to properly escape string values being used in XML instead.

zenmonkeykstop commented 2 years ago

(looks like there's already a fn to remove umlauts in sddir.py, could just generalize it a bit.)

Cachora commented 1 year ago

👍🏻