cheeriojs / cheerio

The fast, flexible, and elegant library for parsing and manipulating HTML and XML.
https://cheerio.js.org
MIT License
28.73k stars 1.64k forks source link

Breaking change in 1.0.0: htmlparser2 mode self-closes empty tags #4034

Open nwalters512 opened 3 months ago

nwalters512 commented 3 months ago

Reproduction: https://github.com/nwalters512/cheerio-self-closing-repro

Code for reference:

import * as oldCheerio from 'cheerio-rc/lib/slim';
import * as newCheerio from 'cheerio-1/slim';

const HTML = '<html><head></head><body><div></div></body></html>';

console.log(oldCheerio.load(HTML).html());
console.log(newCheerio.load(HTML).html());

console.log(oldCheerio.load(HTML, { recognizeSelfClosing: true }).html());
console.log(newCheerio.load(HTML, { xml: { recognizeSelfClosing: true } }).html());

Steps to reproduce:

Observe the following output is printed:

<html><head></head><body><div></div></body></html>
<html><head></head><body><div></div></body></html>
<html><head></head><body><div></div></body></html>
<html><head/><body><div/></body></html>

Specifically, note that <head> and <div> were serialized as self-closing tags.

I'm not sure if this should be considered a bug or not, but it appears to be a breaking change and it isn't called out anywhere in the release notes or upgrade guide: https://cheerio.js.org/blog/cheerio-1.0

nwalters512 commented 3 months ago

It seems that things work as expected if I change the last line to the following (adding xmlMode: false:

console.log(newCheerio.load(HTML, { xml: { recognizeSelfClosing: true, xmlMode: false } }).html());

This doesn't make sense given the configuration documentation (https://cheerio.js.org/docs/advanced/configuring-cheerio#using-htmlparser2-for-html) which states:

You can also use Cheerio's slim export, which always uses htmlparser2. This avoids loading parse5, which saves some bytes eg. in browser environments:

That is, I would expect to not have to set xmlMode: false when using the "slim" export. Do I in fact have to set xmlMode: false even in that case?