golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.36k stars 17.71k forks source link

proposal: mime: expand on what is covered by builtinTypes #69530

Open AidanWelch opened 2 months ago

AidanWelch commented 2 months ago

Proposal Details

Right now,

mime/type.go includes what seems to be a somewhat arbitrary list of built-in types:

var builtinTypesLower = map[string]string{
    ".avif": "image/avif",
    ".css":  "text/css; charset=utf-8",
    ".gif":  "image/gif",
    ".htm":  "text/html; charset=utf-8",
    ".html": "text/html; charset=utf-8",
    ".jpeg": "image/jpeg",
    ".jpg":  "image/jpeg",
    ".js":   "text/javascript; charset=utf-8",
    ".json": "application/json",
    ".mjs":  "text/javascript; charset=utf-8",
    ".pdf":  "application/pdf",
    ".png":  "image/png",
    ".svg":  "image/svg+xml",
    ".wasm": "application/wasm",
    ".webp": "image/webp",
    ".xml":  "text/xml; charset=utf-8",
}

I think some guidance on what should be included in this would be good, rather than a consumer of the package not realizing there are arbitrary gaps. In the meantime I will submit a PR that will incorporate all MDN defined "Common Types" (which also I have to admit is arbitrary, but at least covers more common usecases.)

seankhliao commented 2 months ago

what's included is based on WHATWG mime sniffing https://mimesniff.spec.whatwg.org/ this gives us a clear spec to adhere to, rather than an arbitrary list.

AidanWelch commented 2 months ago

@seankhliao Wow, thanks for the quick response, but I'm confused as to where that actually specifies specifically just the mime types specified in builtinTypes. From my understanding that would be more relevant for net/http's DetectContentType that is actually sniffing. But, for mime's ExtensionsByType and TypeByExtension don't we have the assumption that the file extension/type is truthful and we're trying to determine the most likely type from that- whereas sniffing wouldn't even care about the given type or extension? (And so sniffing would give most(all?) plaintext types for example the same extension/type)

gopherbot commented 2 months ago

Change https://go.dev/cl/614376 mentions this issue: mime: extend "builtinTypes" to include a more complete list of common types

gabyhelp commented 2 months ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

neild commented 2 months ago

what's included is based on WHATWG mime sniffing https://mimesniff.spec.whatwg.org/ this gives us a clear spec to adhere to, rather than an arbitrary list.

net/http.DetectContentType is based on WHATWG's spec; this proposal is for the type/extension mapping used by mime.TypeByExtension and other functions in the mime package when the system MIME database (/etc/mime.types or similar) isn't present.

milhoan commented 1 month ago

Per conversation here https://github.com/whatwg/mimesniff/issues/51#issuecomment-2415555310, the intent of the Mimesniff spec is

"Based on the recent trajectory of changes to this spec, it seems to me that the scope of the spec is client-side sniffing for cross-browser compatibility and protection for the user against malicious files"

Mimesniff spec is not an appropriate spec for a http server use case. It would be better to adopt a different spec for this.

Alternatively, a new function that is server side appropriate that implements a different spec is needed. (EDIT: This comment was regarding DetectContentType, not TypeByExtension)

AidanWelch commented 1 month ago

@milhoan But as of now, this doesn't mimesniff. It just maps file extensions to mime types

milhoan commented 1 month ago

@milhoan But as of now, this doesn't mimesniff. It just maps file extensions to mime types

Sorry, I saw the discussion above about DetectContentType being based on that spec(imo it should not be). Disregard my comment as this is not about that function. I'm 100% in favor of more mime type coverage for TypeByExtension

seankhliao commented 4 weeks ago

Looking at what the browsers do for matching file extensions to mime type:

Chromium https://chromium.googlesource.com/chromium/src/+/master/net/base/mime_util.cc#129 Maintains a primary and secondary mapping, with the preference order being: primary, platform, secondary.

Firefox https://searchfox.org/mozilla-central/source/uriloader/exthandler/nsExternalHelperAppService.cpp#2968 list at https://searchfox.org/mozilla-central/source/uriloader/exthandler/nsExternalHelperAppService.cpp#455 const defs https://searchfox.org/mozilla-central/source/netwerk/mime/nsMimeTypes.h Maintains a default and extra mapping, with the preference order being: default, platform, extras.

Below is a table mapping file extensions to go mime types and chromium / firefox inclusion in primary (1) or secondary (2) lists, and their mime type if it differs from what go has.

extension go mime type chrome firefox
3g2 2 (video/3gpp2)
3gp 2 (video/3gpp)
3gpp 2 (video/3gpp)
aac 2 (audio/aac)
ai 2 (application/postscript) 2 (application/postscript)
apk 2 (application/vnd.android.package-archive) 2 (application/vnd.android.package-archive)
apng 1 (image/apng) 2 (image/apng)
appcache 2 (text/cache-manifest)
arj 2 (application/x-arj)
art 2 (image/x-jg)
avif image/avif 1 2
bin 2 (application/octet-stream) 2 (application/octet-stream)
bmp 2 (image/bmp) 2 (image/bmp)
cer 2 (application/x-x509-ca-cert)
com 2 (application/octet-stream) 2 (application/octet-stream)
crt 2 (application/x-x509-ca-cert)
crx 1 (application/x-chrome-extension)
css text/css 1 2
csv 1 (text/csv) 2 (text/csv)
cur 2 (image/x-icon)
doc 2 (application/msword) 2 (application/msword)
docx 2 (application/vnd.openxmlformats-officedocument.wordprocessingml.document) 2 (application/vnd.openxmlformats-officedocument.wordprocessingml.document)
dot 2 (application/msword)
ehtml 2 (text/html) 2 (text/html)
eml 2 (message/rfc822) 2 (message/rfc822)
eps 2 (application/postscript) 2 (application/postscript)
epub 2 (application/epub+zip)
exe 2 (application/octet-stream) 2 (application/octet-stream)
flac 1 (audio/flac) 2 (audio/flac)
ftl 1 (text/plain)
gif image/gif 1 2
gz 2 (application/x-gzip) 2 (application/gzip)
htm text/html 1 2
html text/html 1 2
ical 2 (text/calendar)
icalendar 2 (text/calendar)
ico 2 (image/vnd.microsoft.icon) 2 (image/x-icon)
ics 2 (text/calendar) 2 (text/calendar)
ifb 2 (text/calendar)
jfif 2 (image/jpeg) 2 (image/jpeg)
jpeg image/jpeg 1 2
jpg image/jpeg 1 2
js text/javascript 2 (application/javascript) 2 (application/x-javascript)
jsm 2 (application/x-javascript)
json application/json 2 2
jxl 2 (image/jxl)
locale 1 (text/plain)
m3u8 2 (application/x-mpegurl)
m4a 1 (audio/x-m4a) 2 (audio/mp4)
m4b 2 (audio/mp4)
m4v 1 (video/mp4)
mht 1 (multipart/related)
mhtml 1 (multipart/related)
mid 2 (audio/x-midi)
mjs text/javascript 1 2 (application/x-javascript)
mml 2 (application/mathml+xml)
mp2 2 (audio/mpeg)
mp3 1 (audio/mp3) 2 (audio/mpeg)
mp4 1 (video/mp4) 2 (video/mp4)
mpeg 2 (video/mpeg)
mpega 2 (audio/mpeg)
mpg 2 (video/mpeg)
odg 2 (application/vnd.oasis.opendocument.graphics)
odp 2 (application/vnd.oasis.opendocument.presentation)
ods 2 (application/vnd.oasis.opendocument.spreadsheet)
odt 2 (application/vnd.oasis.opendocument.text)
oga 1 (audio/ogg) 2 (audio/ogg)
ogg 1 (audio/ogg) 2 (application/ogg)
ogm 1 (video/ogg)
ogv 1 (video/ogg) 2 (video/ogg)
opus 1 (audio/ogg) 2 (audio/ogg)
p7c 2 (application/pkcs7-mime)
p7m 2 (application/pkcs7-mime)
p7s 2 (application/pkcs7-signature)
p7z 2 (application/pkcs7-mime)
pdf application/pdf 2 2
pjp 2 (image/jpeg) 2 (image/jpeg)
pjpeg 2 (image/jpeg) 2 (image/jpeg)
png image/png 2 (image/x-png) 2
ppt 2 (application/vnd.ms-powerpoint) 2 (application/vnd.ms-powerpoint)
pptx 2 (application/vnd.openxmlformats-officedocument.presentationml.presentation) 2 (application/vnd.openxmlformats-officedocument.presentationml.presentation)
properties 1 (text/plain)
ps 2 (application/postscript) 2 (application/postscript)
rdf 2 (application/rdf+xml) 2 (application/rdf+xml)
rss 2 (application/rss+xml)
rtf 2 (application/rtf) 2 (application/rtf)
sh 2 (text/x-sh)
shtm 1 (text/html)
shtml 1 (text/html) 2 (text/html)
svg image/svg+xml 1 2
svgz 1 (image/svg+xml)
swf 2 (application/x-shockwave-flash)
swl 2 (application/x-shockwave-flash)
tar 2 (application/x-tar)
text 2 (text/plain) 2 (text/plain)
tgz 2 (application/x-gzip)
tif 2 (image/tiff) 2 (image/tiff)
tiff 2 (image/tiff) 2 (image/tiff)
txt 2 (text/plain) 2 (text/plain)
vcard 2 (text/vcard)
vcf 2 (text/vcard)
vtt 2 (text/vtt) 2 (text/vtt)
wasm application/wasm 1 2
wav 1 (audio/wav) 2 (audio/x-wav)
weba 2 (audio/webm)
webm 1 (audio/webm) 2 (audio/webm)
webp image/webp 1 2
woff 2 (application/font-woff)
xbl 2 (text/xml) 2 (text/xml)
xbm 2 (image/x-xbitmap) 2 (image/x-xbitmap)
xht 1 (application/xhtml+xml) 2 (application/xhtml+xml)
xhtm 1 (application/xhtml+xml)
xhtml 1 (application/xhtml+xml) 2 (application/xhtml+xml)
xls 2 (application/vnd.ms-excel) 2 (application/vnd.ms-excel)
xlsx 2 (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet) 2 (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
xml text/xml 1 2
xpi 2 (application/x-xpinstall)
xsl 2 (text/xml) 2 (text/xml)
xslt 2 (text/xml)
xul 2 (application/vnd.mozilla.xul+xml)
yuv 2 (video/x-raw-yuv)
zip 2 (application/zip) 2 (application/zip)
seankhliao commented 4 weeks ago

If we are to add more, I propose we limit it to what both browsers have decided to include in their built in lists.

AidanWelch commented 3 weeks ago

That sounds good to me, I can update the PR if that is what's decided on

neild commented 1 week ago

Interestingly, the one case where we override the platform value (on Windows, we ignore a registry entry mapping .js to text/plain) is one where Chrome and Firefox apparently prefer the platform setting.

Limiting our list of builtin mappings to what both Chrome and Firefox include seems reasonably principled. I'd support that.