Eagerod / html-cruncher

HTML parser
MIT License
0 stars 0 forks source link

HTML tags inside attribute strings aren't handled correctly. #13

Open Eagerod opened 8 years ago

Eagerod commented 8 years ago

HTML inside attributes can close tags and cause the parser to raise.

TypeError: Cannot read property '0' of null
    at Function.HTMLElement.processAttributesFromString (/app/node_modules/html-cruncher/lib/html-element.js:218:96)
    at Function.HTMLElement.processTagFromBufferIntoElements (/app/node_modules/html-cruncher/lib/html-element.js:161:21)
    at Function.HTMLElement.processBufferIntoElements (/app/node_modules/html-cruncher/lib/html-element.js:100:21)
------------------------
    at Function.HTMLElement._fromString (/app/node_modules/html-cruncher/lib/html-element.js:74:23)
    at Function.HTMLElement.processTagFromBufferIntoElements (/app/node_modules/html-cruncher/lib/html-element.js:195:48)
    at Function.HTMLElement.processBufferIntoElements (/app/node_modules/html-cruncher/lib/html-element.js:100:21)
----------------------- (Repeated)
Eagerod commented 8 years ago

A known failure input string:

<div>
    <a href="/search?q=</div><a>/results</a>">
        CLICK ME!
    </a>
</div>

And what the output should be: (prone to error, cause it was eyeballed):

[
    {
        "dataType": "tag",
        "content": "div",
        "children": 
        [
            {
                "dataType": "tag",
                "content": "a",
                "attributes": {
                    "href": {
                        "dataType": "attribute",
                        "content": "/search?q=</div><a>/results</a>"
                    }
                },
                "children": [
                    {
                        "dataType": "text",
                        "content": "CLICK ME!"
                    }
                ]
            }
        ]
    }
]