Smileyt / python-markdown2

Automatically exported from code.google.com/p/python-markdown2
Other
0 stars 0 forks source link

Escaped HTML instead of [HTML REMOVED] in safe mode #8

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Would it make more sense to just escape HTML and display it instead of
replacing it with "[HTML REMOVED]"? This is not a big deal but it just
feels more correct to me.

Original issue reported on code.google.com by isagal...@gmail.com on 10 Nov 2007 at 10:29

GoogleCodeExporter commented 8 years ago
Yes, probably. Mostly "safe_mode" was quickly added for compat with markdown.py.
Eventually I want to do a better safe-mode. Currently I'm thinking of something 
like:

- safe_mode=True  (the current behaviour, for compat with markdown.py)
- safe_mode="escape" (the behaviour you describe)
- ensure that subclassing of class Markdown gives the ability to write custom 
code
for handling safe mode stuff

Eventually I think the *right* answer for really safe mode is to use something 
based
on the HTML sanitization discussion here: 
http://wiki.whatwg.org/wiki/Sanitization_rules

Probably would impl that with html5lib. See "Sanitizing Tokenizer" section on 
this
page: http://code.google.com/p/html5lib/wiki/UserDocumentation

Original comment by tre...@gmail.com on 13 Nov 2007 at 6:22

GoogleCodeExporter commented 8 years ago
> Eventually I think the *right* answer for really safe mode is to use 
something based
> on the HTML sanitization discussion here: 
http://wiki.whatwg.org/wiki/Sanitization_rules
> 
> Probably would impl that with html5lib. See "Sanitizing Tokenizer" section on 
this
> page: http://code.google.com/p/html5lib/wiki/UserDocumentation

Agreed. That would be really awesome!

Original comment by isagal...@gmail.com on 13 Nov 2007 at 8:35

GoogleCodeExporter commented 8 years ago
The "-s|--safe" command line option and the equivalent "safe_mode" option has 
changed
semantics to be a string instead of a boolean. Legal values of the string are
"replace" (the old behaviour: literal HTML is replaced with "[HTML_REMOVED]") 
and
"escape" (meta chars in literal HTML is escaped).

(revision 111)

I'm punting on the full html5lib sanitization for now.

Original comment by tre...@gmail.com on 12 Dec 2007 at 6:21

GoogleCodeExporter commented 8 years ago
Just tested 'escape', it appears to work!

One minor thing though: may be it would be good to leave safe_mode=True working 
for
backward compatibility?

Original comment by isagal...@gmail.com on 12 Dec 2007 at 7:25

GoogleCodeExporter commented 8 years ago
Regarding the backward compatibility: yes, ideally that would be best, but I 
found
that leaving that in there was a little bit of a pain. At least for 
*command-line*
iface: it isn't that easy to have a command line option that takes zero or one 
args
with optparse.

I'll see if I can do something here. (Re-opening for a backward compat attempt.)

Original comment by tre...@gmail.com on 12 Dec 2007 at 6:34

GoogleCodeExporter commented 8 years ago
At the Python-level at least, I've re-instated safe_mode=True being allowed: it 
is
transformed immediately to "replace". Note, however, that the -s|--safe option 
does
not allow no argument to imply "replace" mode -- as it did in earlier versions.

(revision 113)

Original comment by tre...@gmail.com on 12 Dec 2007 at 8:06