RangerMauve / hypercore-fetch

Implementation of Fetch that uses the Hyper SDK for loading p2p content
MIT License
37 stars 13 forks source link

Store encoding metadata #100

Open josephmturner opened 3 weeks ago

josephmturner commented 3 weeks ago

Currently, hypercore-fetch assumes that all text files are UTF-8 when returning a Content-Type header:

function getMimeType (path) {
  let mimeType = mime.getType(path) || 'text/plain; charset=utf-8'
  if (mimeType.startsWith('text/')) mimeType = `${mimeType}; charset=utf-8`
  return mimeType
}

In order to return the actual coding system for a file, we'd need to know how the file was encoded when it was written to the hyperdrive.

IIUC, the standard way to specify a file's encoding in a PUT request is in the Content-Type header, e.g., Content-Type: text/plain; charset=ISO-8859-1.

Once hypercore-fetch receives the file and its coding system, where should the encoding metadata be stored?

We could put it alongside the mtime metadata in entry.value.metadata.encoding. WDYT?

josephmturner commented 3 weeks ago

It looks like the python requests library uses latin-1 (not UTF-8) as the fallback for text/ media type files:

    if "charset" in params:
        return params["charset"].strip("'\"")

    if "text" in content_type:
        return "ISO-8859-1"

    if "application/json" in content_type:
        # Assume UTF-8 based on RFC 4627: https://www.ietf.org/rfc/rfc4627.txt since the charset was unset
        return "utf-8"