Parsing fails and there is raw html code in rendered html

Hi there, Parsing fails for some pages ( eg. this article )

To replicate, open the generated html in a browser

    const document = (new DOMParser).parseFromString(htmlFromTheArticle, 'text/html');
    const html = document.body.innerHTML;

Instead of the original page, it now includes raw html code.

Διαβάστε το πλήρες κείμενο του σημειώματος του CEO της UBS στο <a href="https://www.newmoney.gr/roh/bloomberg/to-esoteriko-simioma-tou-ceo-tis-ubs-pros-tous-ergazomenous-meta-tin-exagora-tis-credit-suisse/" target="_blank" rel="noopener noreferrer">newmoney.gr</a> <a href="https://www.protothema.gr/oles-oi-eidiseis/" target="_blank" rel="noopener noreferrer">Ειδήσεις σήμερα:</a> <a href="https://www.protothema.gr/greece/article/1351532/xanthi-ston-eisaggelea-simera-o-36hronos-pou-skotose-ton-45hrono-epeidi-ton-theorise-roufiano/" target="_blank" rel="noopener noreferrer">...

It happened for many html documents already. The culprit is htmlparser2, if I downgrade to v6.1.0, it works properly.

I tried to debug and the problem is caused in Tokenizer.ts. When I simply replace these lines

            if (this.isSpecial) {
                this.state = State.InSpecialTag;
                this.sequenceIndex = 0;
            } else {
                this.state = State.Text;
            }

With

this.state = State.Text;

It works properly. I'm not sure what is the proper fix, which will not affect the performance of htmlparser2, so I opened this issue instead.

WebReflection / linkedom

Parsing fails and there is raw html code in rendered html #200