Vereyon / HtmlRuleSanitizer

A rule based HTML sanitizer built on top of the HTML Agility pack
MIT License
62 stars 19 forks source link
c-sharp enforcement html html-sanitization htmlagilitypack nuget


Nuget version

HtmlRuleSanitizer is a white list rule based HTML sanitizer built on top of the HTML Agility Pack. Use it to cleanup HTML and removing malicious content.

var sanitizer = HtmlSanitizer.SimpleHtml5Sanitizer();
string cleanHtml = sanitizer.Sanitize(dirtyHtml);

Without configuration HtmlRuleSanitizer will strip absolutely everything. This ensures that you are in control of what HTML is getting through. It was inspired by the client side parser of the wysihtml5 editor.

Use cases

HtmlRuleSanitizer was designed with the following use cases in mind:



Install the HtmlRuleSanitizer NuGet package. Optionally add the following using statement in the file where you intend to use HtmlRuleSanitizer:

using Vereyon.Web;

Basic usage

var sanitizer = HtmlSanitizer.SimpleHtml5Sanitizer();
string cleanHtml = sanitizer.Sanitize(dirtyHtml);

Note: the SimpleHtml5Sanitizer returns a rule set which does not allow for a full document definition. Use SimpleHtml5DocumentSanitizer

Sanitize a document

When dealing with full HTML documents including the html and body tags, use SimpleHtml5DocumentSanitizer:

var sanitizer = HtmlSanitizer.SimpleHtml5DocumentSanitizer();
string cleanHtml = sanitizer.Sanitize(dirtyHtml);


The code below demonstrates how to configure a rule set which only allows strong, i and a tags and which enforces the link tags to have a valid url, be no-follow and open in a new window. In addition, any b tag is renamed to strong because they more or less do the same anyway and b is deprecated. Any empty tags are removed to get rid of them. This would be a nice example for comment processing.

var sanitizer = new HtmlSanitizer();
sanitizer.Tag("a").SetAttribute("target", "_blank")
    .SetAttribute("rel", "nofollow")

string cleanHtml = sanitizer.Sanitize(dirtyHtml);

CSS class whitelisting

Global CSS class whitelisting is achieved as follows where CSS classes are space separated:

sanitizer.AllowCss("legal also-legal");

Custom attribute sanitization

Attribute sanitization can be peformed by implementing a custom IHtmlAttributeSanitizer. The code below illustrates a simple custom sanitizer which overrides the attribute value:

class CustomSanitizer : IHtmlAttributeSanitizer
    public SanitizerOperation SanitizeAttribute(HtmlAttribute attribute, HtmlSanitizerTagRule tagRule)
        // Override the attribute value and leave the attribute as be.
        attribute.Value = "123";
        return SanitizerOperation.DoNothing;

The custom sanitizer can then be assigned to the desired attributes as follows:

var sanitizer = new HtmlSanitizer();
var attributeSanitizer = new CustomSanitizer();
sanitizer.Tag("span").SanitizeAttributes("style", attributeSanitizer);

Custom element sanitization

Element sanitization can be performed by implement a customer IHtmlElementSanitizer, much like custom attribute sanitization. The code below illustrates a custom sanitizer which will remove span elements which contain the text "remove me":

var sanitizer = new HtmlSanitizer();
sanitizer.Tag("span").Sanitize(new CustomSanitizer(element =>
    return element.InnerText == "remove me"
        ? SanitizerOperation.RemoveTag
        : SanitizerOperation.DoNothing;


Contributions are welcome through a GitHub pull request.


dotnet restore


Got tests? Yes, see the tests project. It uses xUnit.

cd Web.HtmlSanitizer.Tests/
dotnet test

More information
