Open josephmturner opened 3 weeks ago
It looks like the python requests library uses latin-1
(not UTF-8
) as the fallback for text/
media type files:
if "charset" in params:
return params["charset"].strip("'\"")
if "text" in content_type:
return "ISO-8859-1"
if "application/json" in content_type:
# Assume UTF-8 based on RFC 4627: https://www.ietf.org/rfc/rfc4627.txt since the charset was unset
return "utf-8"
Currently,
hypercore-fetch
assumes that all text files are UTF-8 when returning aContent-Type
header:In order to return the actual coding system for a file, we'd need to know how the file was encoded when it was written to the hyperdrive.
IIUC, the standard way to specify a file's encoding in a
PUT
request is in theContent-Type
header, e.g.,Content-Type: text/plain; charset=ISO-8859-1
.Once
hypercore-fetch
receives the file and its coding system, where should the encoding metadata be stored?We could put it alongside the
mtime
metadata inentry.value.metadata.encoding
. WDYT?