WebReflection / basicHTML

A NodeJS based, standard oriented, HTML implementation.
ISC License
126 stars 10 forks source link

Issue with not self-closing <void-element> tag #47

Closed nhoizey closed 4 years ago

nhoizey commented 4 years ago

If I run this script in the test/ folder:

const { Document } = require("../basichtml.js");
const document = new Document();
document.documentElement.innerHTML =
  "start <img> ghost1 <img/> ghost2 <img/> end";
console.log(document.toString());

I get this result:

<!DOCTYPE html><html>start <img /><img /> ghost2 <img /> end</html>

ghost1 disappeared.

It looks like it's because the first is not self-closing.

WebReflection commented 4 years ago

yup, that's a bug, thanks for filing this πŸ‘

nhoizey commented 4 years ago

@WebReflection you're welcome!

It did take some time to find my issue was coming from here, and get a useful reduced test case… πŸ˜…

WebReflection commented 4 years ago

yeah, it's weird 'cause it might be a htmlparser2 bug, incapable of "streaming" the parsing as it goes ... so the workaround might take some time, apologies for the inconvenient, I should've tested that case in test/all.js myself, but I didn't think about that specific combo :-(

WebReflection commented 4 years ago

P.S. truth to be told, I might as well transform every void element without a self-closing tag before passing it to htmlparser2 so ... finger crossed I'll find a solution sooner than I think

nhoizey commented 4 years ago

That would be great, thanks for your time on this! πŸ‘

nhoizey commented 4 years ago

Looks like the issue is also there for other self-closing tags as <br />:

const { Document } = require("../basichtml.js");
const document = new Document();
document.documentElement.innerHTML = "start <br> ghost1 <br/> ghost2 <br/> end";
console.log(document.toString());

Gives this:

<!DOCTYPE html><html>start <br /><br /> ghost2 <br /> end</html>

However, this script (with <p>ghost1</p> instead of just ghost1):

const { Document } = require("../basichtml.js");
const document = new Document();
document.documentElement.innerHTML =
  "start <br><p>ghost1</p><br/> ghost2 <br/> end";
console.log(document.toString());

Gives this:

<!DOCTYPE html><html>start <br /><p>ghost1</p><br /> ghost2 <br /> end</html>

So it looks like it makes disappear only plain text after the not closed self-closing tag, not another tag/element.

WebReflection commented 4 years ago

Looks like the issue is also there for other self-closing tags as ...

yeah, the whole void elements set is haunted right now

WebReflection commented 4 years ago
Element.VOID_ELEMENT = /^area|base|br|col|embed|hr|img|input|keygen|link|menuitem|meta|param|source|track|wbr$/i;
WebReflection commented 4 years ago

btw, it's Easter holidays in here, long weekend, so I'll try to work on this, but don't hold your breath as it might take a little while. Use self-closing tags for the time being, but I'll get this fixed ASAP.

nhoizey commented 4 years ago

Enjoy Easter, no problem! πŸ‘

nhoizey commented 4 years ago

Here is a suggested quick fix: https://regexr.com/52a2j

WebReflection commented 4 years ago

not robust enough, but that's the way I'm planning to solve πŸ‘

nhoizey commented 4 years ago

Yes, not the right long term solution, but I'll use it in my project as long as needed.

Enjoy Easter! πŸŽ‰

WebReflection commented 4 years ago

fixed πŸ‘‹

nhoizey commented 4 years ago

@WebReflection sorry, it looks like v2.2.3 doesn't work yet. πŸ˜₯

This script:

const { Document } = require('basichtml');
const document = new Document();
document.documentElement.innerHTML = '<p>Hello<br>mister<br />tom</p>';
console.log(document.toString());

Gives me this result:

<!DOCTYPE html><html><p>Hello<br /><br />tom</p></html>

mister is not there anymore.

nhoizey commented 4 years ago

If this can help, this script:

const { Document } = require('basichtml');
const document = new Document();
document.documentElement.innerHTML = '<p>Hello<br><br />mister<br />tom</p>';
console.log(document.toString());

Gives me good this result:

<!DOCTYPE html><html><p>Hello<br /><br />mister<br />tom</p></html>

I added a self closing <br /> after the non self-closing one.

Looks like the issue is that a non self-closed void element "eats" the text node after it.

WebReflection commented 4 years ago

Why are you writing to the htnl element instead of the body? It's a non interesting use case, is it?

Try using document.body instead

nhoizey commented 4 years ago

OMG, you're right, I was not using the right element… sorry!

WebReflection commented 4 years ago

There might be a gotcha with whole HTML content though, so I'll fix that too, but for regular elements it's all good. I'll open a new issue for the documentElement case

WebReflection commented 4 years ago

OK, latest version works with documentElement too πŸ‘

nhoizey commented 4 years ago

Awesome, this fixed the issue in my project. Thanks a lot!

WebReflection commented 4 years ago

FYI I've moved the whole parsing into Element so now both documentElement and every other node works the same.

nhoizey commented 4 years ago

Ok, good to know! πŸ‘