use bleach on generated markdown

cdent commented 10 years ago

We don't want to completely disallow html but we do want to disallow at least script.

cdent commented 10 years ago

python-markdown recommends it.

cdent commented 10 years ago

form tag too

cdent commented 10 years ago

Solving this is more complicated than desired, input from @jsavage and @FND (this probably relevant to BFW as well) would be helpful.

The basic issue is that by default you can write any HTML in the current markdown, including script tags. The link from above has two suggestions:

use built in safe mode to escape, replace or remove any HTML provided by the user
use html5lib or bleach (which wraps html5lib) to sanitize the HTML that is created by the library, after generation

Neither of these options are great:

Having HTML available can be handy when the available formatting won't do. jsavage has done some interesting things with iframes, forms and related things. Those tags are actually in the set of primary things that should be escaped, but that doesn't mean the general concept of using HTML should be killed.
html5lib is somewhat configurable (you can doctor the mixin to add and remove tags and attributes that are acceptable) so could be made to provide a simple whitelist of stuff that is considered okay. However, it will not, and cannot without changes to the code (deep within a big method), accept data-* attributes and strips them out. That's problematic because the transclusion mechanism uses data-* to keep track of the origin and other metadata of the transcluded tiddler. I would be very reluctant to give up that information.

What do people think should be done here. I see the options as:

kill HTML entirely using option 1 above to escape
hack a local version of html5lib
get html5lib fixed somehow but this is an old problem that people have been talking about for quite some time
stop using data-* with transclusion

In the absence of any hard need for HTML in markdown I might go for 1, at least as a configurable option to tiddlywebplugins.markdown.

FND commented 10 years ago

So the sanitizing would happen after Markdown (incl. extensions) was rendered to HTML - and that's why data- attributes are relevant (i.e. not because some user manually enters them in the Markdown source)?

I agree that #1 is the best way to go, though I'm not sure how best to make transclusion work properly again - and it sounds like any Markdown extension rendering custom HTML (e.g. checkboxes, footnotes) will be problematic as well?

cdent commented 10 years ago

Sorry, I wasn't quite clear. Santization using bleach or html5lib would happen on the generated HTML. The safe mode built in to markdown operates on the inputted HTML. So option 1.1 would still allow transclusion etc to work, because that is generated HTML. It just wouldn't allow the user to write HTML.

jsavage commented 10 years ago

Am not familiar with safe mode and what that would enable. But what I was trying to do in a very simple way with iframes was put together a single daskboard like view of tanks, recipes, normal tiddlers and binary ones to help me understand tank and its possibilities better. Whilst that use case is probably rather time limited, but there are probably many other use cases for being able to bring in dynamic html or text data from outside of tank. This seems similar in concept to transclusion but for potentially any external resource. It would be good to have more control over such things eg to filter out the unnecessary in a programatic way which is perhaps where scripts would be handy. That said, do what you need to protect the platform.

On 23 February 2014 09:43, FND notifications@github.com wrote:

So the sanitizing would happen after Markdown (incl. extensions) was rendered to HTML - and that's why data- attributes are relevant (i.e. not because some user manually enters them in the Markdown source)?

I agree that #1 https://github.com/cdent/tank/issues/1 is the best way to go, though I'm not sure how best to make transclusion work properly again - and it sounds like any Markdown extension rendering custom HTML (e.g. checkboxes, footnotes) will be problematic as well?

Reply to this email directly or view it on GitHubhttps://github.com/cdent/tank/issues/61#issuecomment-35827761 .

<img src=" http://mystatus.skype.com/balloon/jdcsavage" style="border: none;" width="150" height="60" alt="My status" />
Get Skype and call me for free.

cdent commented 10 years ago

Safe mode would basically escape any incoming html so that it looks in the output as you typed it in. Basically < and > get turned into < and >. This means any and all HTML would be disabled.

I agree that a dashboard that is more informative/overview than /dash would be useful but it would have to be something that is provided by the server rather than generated by raw iframes as those are just too easy to make into a really messy security risk.

Custom markup for markdown with loaded bag and recipe information or tiddlers lists is a possibility if there is demand, but it is not a super high priority because you can always just go browse the HTML representation of the API.

FND commented 10 years ago

just wouldn't allow the user to write HTML

That seems perfectly acceptable to me (tangentially related: FND/tiddlywebplugins.bfw#37).

iframes [...] are just too easy to make into a really messy security risk

FWIW: http://fnd.tiddlyspace.com/iframe%20security - short version: modern browser allow sandboxing, which is pretty safe if implemented correctly

However, sounds like a set of macros (cf. FND/tiddlywebplugins.bfw#61) might work better for that kind of dashboard.

cdent commented 10 years ago

sandboxing is cool if you can:

rely on modern browsers
rely on the person writing the iframe to include a sandbox tag

CSP is cool if you can:

rely on modern browsers

In both cases you can get around the modern browsers aspect by changing behavior server-side based on user-agent, but of course that can be spoofed.

For the sake of simplicity in the rendering subsystem I agree that macros would be the right way to go.

Obviously in a more demanding situation more could be done, but I think that situation should only be addressed when we come to it. I think I'll changed twp.markdown to safe mode by default but have a config to make it not.

cdent commented 10 years ago

Curiously adding safe_mode works for transclusion but breaks checklists.

I thought this was because of an ordering problem where processors are added but I think it is actually because a preprocessor manipulates the source text. That means you are writing HTML into the input before it is parsed.

Doing it as a postprocessor only is possible, at least in initial testing, so I'll send @FND a pull request for further discussion.

cdent / tank

use bleach on generated markdown #61