fanglingsu / vimb

Vimb - the vim like browser is a webkit based web browser that behaves like the vimperator plugin for the firefox and usage paradigms from the great editor vim. The goal of vimb is to build a completely keyboard-driven, efficient and pleasurable browsing-experience.
https://fanglingsu.github.io/vimb/
GNU General Public License v3.0
1.35k stars 100 forks source link

"This page contains the following errors:..." when opening a `file:///` url ending in `.html`; page works fine over `https` with mime type `text/html` #750

Open falsifian opened 1 year ago

falsifian commented 1 year ago

Steps to reproduce

Create a file ~/f.html with the following text (which may violate some standard):

<!doctype html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" style="font-family: Inter, Averta, Helvetica, Arial; font-weight: 400; color: #454745; font-size: 16px;">
</html>

Try to open it as a file://... url. Upload it to a web server (serving it as MIME type text/html) and try to open it via an https://... url.

Expected behaviour

vimb treats the file the same over file://... and https://....

Actual behaviour

vimb has no complaints when accessing it over https://...: a blank page is shown. (I actually had a file with some real content; the content is shown without trouble.) But via file:///... I see a red box with this message:

This page contains the following errors:

error on line 1 at column 2: StartTag: invalid element name
Below is a rendering of the page up to the first error.
fanglingsu commented 1 year ago

@falsifian This might be related to determining the file mime type from the local file. As I know the mime type is determined by mime-magic. I think your local file ist considered as XML or xhtml, and not as HTML 5 so the XML validation is applied. The version requested from webserver uses text/html. What shows your system for "file PathToFile"?

falsifian commented 1 year ago
$ file f.html
f.html: exported SGML document text

This is on my OpenBSD-current box, on which I found the bug. For completeness: on Linux it gives a different answer: the file command run on a Debian 10 machine on the exact same file (sha256 checksum 5244c900ab5f11e61dd191a5c39a622b6ae1320c6227c09acbd79ca359e87fea) says HTML document, ASCII text.