Closed wookie41 closed 4 years ago
Please add an option to preserve casing of tags and attributes. I've the same issue with sanitizing svg. This is one of the best tools out there and this particular shortcoming is a show stopper for my project. Any possible workarounds is also appreciated until this is fixed. Thanks!
@zeeneir a while back I did something like this and actually forgot to open a pull request for it. I'll review it on monday and open a PR.
https://github.com/wookie41/java-html-sanitizer/commit/affcccd93cec5a3c91856316179658471a2006f0
@wookie41 excellent and thanks :ok_hand:
Can you explain how case sensitivity is a problem? and render the same as a simple example. Where is this a problem for you, can you give us a test case?
Jim Manico @Manicode
On Mar 27, 2020, at 10:50 AM, zeeneir notifications@github.com wrote:
Please add an option to preserve casing of tags and attributes. I've the same issue with sanitizing svg. This is one of the best tools out there and this particular shortcoming is a show stopper for my project. Any possible workarounds is also appreciated until this is fixed. Thanks!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
@jmanico SVG renderers actually care about the casing (at least the one in Firefox) and the image simply doesn't show, when viewBox
is sanitized to viewbox
. I know that this isn't an svg sanitizer, but it does the job I need it to do after creating an svg policy.
That’s a pretty solid test case of you ask me.
-- Jim Manico @Manicode
On Mar 27, 2020, at 11:03 AM, Łukasz Bogaczyński notifications@github.com wrote:
@jmanico SVG renderers actually care about the casing (at least the one in Firefox) and the image simply doesn't show, when viewBox is sanitized to viewbox. I know that this isn't an svg sanitizer, but it does the job I need it to do, after creating an svg policy.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Same comment as @wookie41, Agree that this is HTML sanitizer, but I think this small change would allow proper sanitizing of SVG documents. Some renderers don't care about case, but others do. Mozilla states:
SVG elements and attributes should all be entered in the case shown here since XML is case-sensitive (unlike HTML). Here: https://developer.mozilla.org/en-US/docs/Web/SVG/Tutorial/Introduction
any updates for this issue?
What's the list of names that need to be preserved?
HTML uppercase qualified name provides a notion that we could use for a canonical name that prefers lower-case ASCII.
If an element/attribute name is not defined in the HTML namespace and it is defined in the corresponding SVG or MathML namespace, then we use the preferred casing from that namespace, otherwise we convert to lower-case in a locale-insensitive manner.
I don't think the xlink namespace matters.
Would that be sufficient?
If an element/attribute name is not defined in the HTML namespace and it is defined in the corresponding SVG or MathML namespace, then we use the preferred casing from that namespace, otherwise we convert to lower-case in a locale-insensitive manner.
Ideal would be a setting on the policy builder that indicates whether case should be preserved for tags/attributes, however, this proposal is good as well. If you go this route, you don't need us to provide a list, do you?
@zeeneir If the bug results from a mismatch in how converted names are interpreted by HTML parsers then I don't see how more policy settings would help.
@mikesamuel, I see. Would you proposal cover overlapping tags/attributes? One example I can think of: HTML has
Most of the overlaps are innocuous: <a>
, <script>
, src=
.
I was planning on coming up with whitelists of foreign element and attribute names, and just not lower-casing if an identifier is in that set.
Unless there's a need for Tiny 1.2 I'd rather go with what spec.html references:
[MATHML] Mathematical Markup Language (MathML) [SVG] Scalable Vector Graphics (SVG) 2
IIUC, for SVG, the lists are Appendix F and Appendix G
But I'm not super familiar with which drafts of SVG are supported by which browser vendors.
Sounds good. Could you please include support for SVG Tiny. SVG Tiny is lightweight version of SVG, is suitable for mobile devices and is necessary for my use case.
Here’s the latest spec: https://www.w3.org/TR/SVGMobile/single-page.html
Element List: https://www.w3.org/TR/SVGMobile/single-page.html#chapter-elementTable
Attribute/Property list: https://www.w3.org/TR/SVGMobile/single-page.html#chapter-attributeTable
For the record, JS used to extract mixed-case attribute and element names from the specs in lieu of data files.
// https://www.w3.org/TR/SVGMobile/single-page.html#chapter-elementTable
let elementTable = document.getElementById('elementTable-elements');
let elementCells = Array.from(elementTable.querySelectorAll('tr > :first-child')).map(x => x.innerText);
function isMixedCase(x) { return !(/:/.test(x)) && /[a-z]/.test(x) && /[A-Z]/.test(x) }
elementCells.map(x => x.replace(/^\'|\'$/g, '')).filter(x => x != 'Elements' && isMixedCase(x))
// (8) ["animateColor", "animateMotion", "animateTransform", "foreignObject", "linearGradient", "radialGradient", "solidColor", "textArea"]
let attributeCells = Array.from(elementTable.querySelectorAll('tr > :nth-child(2)')).map(x => x.innerText);
Array.from(new Set(attributeCells.flatMap(x => x.split(/, /g)).filter(x => x != 'Attributes' && isMixedCase(x))));
// (37) ["externalResourcesRequired", "focusHighlight", "requiredExtensions", "requiredFeatures", "requiredFonts", "requiredFormats", "systemLanguage", "attributeName", "attributeType", "calcMode", "keySplines", "keyTimes", "repeatCount", "repeatDur", "keyPoints", "initialVisibility", "preserveAspectRatio", "syncBehavior", "syncMaster", "syncTolerance", "gradientUnits", "defaultAction", "pathLength", "mediaCharacterEncoding", "mediaContentEncodings", "mediaSize", "mediaTime", "baseProfile", "contentScriptType", "playbackOrder", "snapshotTime", "syncBehaviorDefault", "syncToleranceDefault", "timelineBegin", "viewBox", "zoomAndPan", "transformBehavior"]
//https://svgwg.org/svg2-draft/eltindex.html
function isMixedCase(x) { return !(/:/.test(x)) && /[a-z]/.test(x) && /[A-Z]/.test(x) }
Array.from(document.querySelectorAll('.element-name')).map(x => x.innerText.replace(/^\u2018|\u2019$/g, '')).filter(isMixedCase)
// (32) ["animateMotion", "animateTransform", "clipPath", "feBlend", "feColorMatrix", "feComponentTransfer", "feComposite", "feConvolveMatrix", "feDiffuseLighting", "feDisplacementMap", "feDistantLight", "feDropShadow", "feFlood", "feFuncA", "feFuncB", "feFuncG", "feFuncR", "feGaussianBlur", "feImage", "feMerge", "feMergeNode", "feMorphology", "feOffset", "fePointLight", "feSpecularLighting", "feSpotLight", "feTile", "feTurbulence", "foreignObject", "linearGradient", "radialGradient", "textPath"]
// https://svgwg.org/svg2-draft/attindex.html
function isMixedCase(x) { return !(/:/.test(x)) && /[a-z]/.test(x) && /[A-Z]/.test(x) }
Array.from(new Set(Array.from(document.querySelectorAll('.attr-name')).map(x => x.innerText.replace(/^\u2018|\u2019$/g, '')).filter(isMixedCase)))
// (52) ["attributeName", "baseFrequency", "calcMode", "clipPathUnits", "diffuseConstant", "edgeMode", "filterUnits", "gradientTransform", "gradientUnits", "kernelMatrix", "kernelUnitLength", "keyPoints", "keySplines", "keyTimes", "lengthAdjust", "limitingConeAngle", "markerHeight", "markerUnits", "markerWidth", "maskContentUnits", "maskUnits", "numOctaves", "pathLength", "patternContentUnits", "patternTransform", "patternUnits", "pointsAtX", "pointsAtY", "pointsAtZ", "preserveAlpha", "preserveAspectRatio", "primitiveUnits", "refX", "refY", "repeatCount", "repeatDur", "requiredExtensions", "specularConstant", "specularExponent", "spreadMethod", "startOffset", "stdDeviation", "stitchTiles", "surfaceScale", "systemLanguage", "tableValues", "targetX", "targetY", "textLength", "viewBox", "xChannelSelector", "yChannelSelector"]
// https://www.w3.org/Math/draft-spec/appendixi.html
let [elementList, attributeList] = document.querySelectorAll('dl');
function isMixedCase(x) { return !(/:/.test(x)) && /[a-z]/.test(x) && /[A-Z]/.test(x) }
Array.from(elementList.querySelectorAll('dt')).map(x => x.innerText).filter(isMixedCase)
// []
Array.from(attributeList.querySelectorAll('dt')).map(x => x.innerText).filter(isMixedCase)
// (2) ["definitionURL", "schemaLocation"]
Unioning these lists gives us
function union(lists) {
let a = Array.from(new Set(lists.flatMap(x => x)));
a.sort();
return a;
}
union([
["animateColor", "animateMotion", "animateTransform", "foreignObject", "linearGradient", "radialGradient", "solidColor", "textArea"],
["animateMotion", "animateTransform", "clipPath", "feBlend", "feColorMatrix", "feComponentTransfer", "feComposite", "feConvolveMatrix", "feDiffuseLighting", "feDisplacementMap", "feDistantLight", "feDropShadow", "feFlood", "feFuncA", "feFuncB", "feFuncG", "feFuncR", "feGaussianBlur", "feImage", "feMerge", "feMergeNode", "feMorphology", "feOffset", "fePointLight", "feSpecularLighting", "feSpotLight", "feTile", "feTurbulence", "foreignObject", "linearGradient", "radialGradient", "textPath"],
[]
])
⇒
[
"animateColor",
"animateMotion",
"animateTransform",
"clipPath",
"feBlend",
"feColorMatrix",
"feComponentTransfer",
"feComposite",
"feConvolveMatrix",
"feDiffuseLighting",
"feDisplacementMap",
"feDistantLight",
"feDropShadow",
"feFlood",
"feFuncA",
"feFuncB",
"feFuncG",
"feFuncR",
"feGaussianBlur",
"feImage",
"feMerge",
"feMergeNode",
"feMorphology",
"feOffset",
"fePointLight",
"feSpecularLighting",
"feSpotLight",
"feTile",
"feTurbulence",
"foreignObject",
"linearGradient",
"radialGradient",
"solidColor",
"textArea",
"textPath"
]
union([
["externalResourcesRequired", "focusHighlight", "requiredExtensions", "requiredFeatures", "requiredFonts", "requiredFormats", "systemLanguage", "attributeName", "attributeType", "calcMode", "keySplines", "keyTimes", "repeatCount", "repeatDur", "keyPoints", "initialVisibility", "preserveAspectRatio", "syncBehavior", "syncMaster", "syncTolerance", "gradientUnits", "defaultAction", "pathLength", "mediaCharacterEncoding", "mediaContentEncodings", "mediaSize", "mediaTime", "baseProfile", "contentScriptType", "playbackOrder", "snapshotTime", "syncBehaviorDefault", "syncToleranceDefault", "timelineBegin", "viewBox", "zoomAndPan", "transformBehavior"] ,
["attributeName", "baseFrequency", "calcMode", "clipPathUnits", "diffuseConstant", "edgeMode", "filterUnits", "gradientTransform", "gradientUnits", "kernelMatrix", "kernelUnitLength", "keyPoints", "keySplines", "keyTimes", "lengthAdjust", "limitingConeAngle", "markerHeight", "markerUnits", "markerWidth", "maskContentUnits", "maskUnits", "numOctaves", "pathLength", "patternContentUnits", "patternTransform", "patternUnits", "pointsAtX", "pointsAtY", "pointsAtZ", "preserveAlpha", "preserveAspectRatio", "primitiveUnits", "refX", "refY", "repeatCount", "repeatDur", "requiredExtensions", "specularConstant", "specularExponent", "spreadMethod", "startOffset", "stdDeviation", "stitchTiles", "surfaceScale", "systemLanguage", "tableValues", "targetX", "targetY", "textLength", "viewBox", "xChannelSelector", "yChannelSelector"],
["definitionURL", "schemaLocation"]
])
⇒
[
"attributeName",
"attributeType",
"baseFrequency",
"baseProfile",
"calcMode",
"clipPathUnits",
"contentScriptType",
"defaultAction",
"definitionURL",
"diffuseConstant",
"edgeMode",
"externalResourcesRequired",
"filterUnits",
"focusHighlight",
"gradientTransform",
"gradientUnits",
"initialVisibility",
"kernelMatrix",
"kernelUnitLength",
"keyPoints",
"keySplines",
"keyTimes",
"lengthAdjust",
"limitingConeAngle",
"markerHeight",
"markerUnits",
"markerWidth",
"maskContentUnits",
"maskUnits",
"mediaCharacterEncoding",
"mediaContentEncodings",
"mediaSize",
"mediaTime",
"numOctaves",
"pathLength",
"patternContentUnits",
"patternTransform",
"patternUnits",
"playbackOrder",
"pointsAtX",
"pointsAtY",
"pointsAtZ",
"preserveAlpha",
"preserveAspectRatio",
"primitiveUnits",
"refX",
"refY",
"repeatCount",
"repeatDur",
"requiredExtensions",
"requiredFeatures",
"requiredFonts",
"requiredFormats",
"schemaLocation",
"snapshotTime",
"specularConstant",
"specularExponent",
"spreadMethod",
"startOffset",
"stdDeviation",
"stitchTiles",
"surfaceScale",
"syncBehavior",
"syncBehaviorDefault",
"syncMaster",
"syncTolerance",
"syncToleranceDefault",
"systemLanguage",
"tableValues",
"targetX",
"targetY",
"textLength",
"timelineBegin",
"transformBehavior",
"viewBox",
"xChannelSelector",
"yChannelSelector",
"zoomAndPan"
]
Included in the latest releas
It would be nice if the the policy builder had an option that disabled the "sanitization" of tags and attributes names. Currently all names are converted to lowercase which is ok when you're using it for HTML only, but if there is an SVG image nested inside the HTML it breaks. For example, when
viewBox
attribute on is converted toviewbox
and the image is not displayed correctly.