eduardoboucas / staticman

💪 User-generated content for Git-powered websites
https://staticman.net
MIT License
2.41k stars 531 forks source link

Sanitize HTML #166

Open mmistakes opened 6 years ago

mmistakes commented 6 years ago

Not really sure if this is something Staticman should handle or not, but putting this out there.

I recently was made aware that arbitrary HTML could be passed in my comments. Which basically means someone can drop in <style> or <script> tags into a comment's message and mess with the page.

Easy way around that would be to use Jekyll's strip_html filter on that data, but that ruins the ability to leave nicely formatted code blocks written in Markdown... which go through the markdownify filter.

This isn't that big of an issue for me since I have moderation on, but for those who blindly let Staticman publish comments they could be exposing their site.

eduardoboucas commented 6 years ago

Definitely something we can do. Thanks for flagging, I'll look into that.

jimmyangel commented 6 years ago

Hi @eduardoboucas

I really like Staticman and I am implementing in one of my static sites. Thank you for your work!

However, I did notice that not sanitizing the form data makes the site vulnerable to cross-site scripting attacks. So this is a bit of a showstopper for me. In the mean time, I will clean the form input on the client, but obviously that is not ideal.

I will be happy to help you implement this. Do you have any implementation ideas?

Thanks,

Ricardo

jimmyangel commented 6 years ago

More info -- I am using hugo and I am working around this issue by using:

{{ .body | plainify | markdownify }} and {{ .name | plainify }}

So pure markdown will work fine but all explicit html will be stripped. This works well, because data is sanitized upon site generation.

However, I am not sure I like to have bad input data stored in the repo.

mmistakes commented 6 years ago

@jimmyangel Would this method strip out HTML used in code blocks? Guess it depends on the type of site, but I could see the use for example code finding its way into comments.

jimmyangel commented 6 years ago

I just tested it and it doesn't (i.e., it works fine as expected). Looks like plainify considers the backtick plain text.

Pretty much all markdown will pass plainify, except any embedded HTML.

jimmyangel commented 6 years ago

Actually @mmistakes, I misunderstood your question -- and yes, html in code blocks will be stripped out.

metapodcod commented 2 years ago

@mmistakes @eduardoboucas It's been 4 years, hasn't found a solution yet? Anyone have any ideas on this?