WICG / file-system-access

Expose the file system on the user’s device, so Web apps can interoperate with the user’s native applications.
https://wicg.github.io/file-system-access/
Other
667 stars 66 forks source link

Explainer 3. This should be a "sandboxed" directory #44

Closed guest271314 closed 5 years ago

guest271314 commented 5 years ago

What is all this?

  1. Various entry points to get a handle representing a limited view of the native file system. I.e. either via a file picker, or to get access to certain well known directories, or even to get access to the whole native file system. Mimicking things such as chrome's chrome.fileSystem.chooseEntry API.

This should be a single "sandboxed" directory, either stored in the browser configuration directory, e.g., the manner in which Blobs are stored, or a single directory outside of the browser configuration folder which the user selects.

get access to the whole native file system

invites issues for users who are not developers, or for developers who make mistakes.

Reason:

Too many things can go wrong, e.g., Full path to file at local filesystem is set as value of textarea element when files are dragged and dropped at element (Linux) https://bugzilla.mozilla.org/show_bug.cgi?id=1311823. And if things do wrong the user might not even be aware that something went wrong, though the data would already be accessed.

No one can un-press "Send".

mkruisselbrink commented 5 years ago

Limiting this to a sandboxed filesystem is explicitly not the goal of the API. We want this API so web applications can integrate and interact better with native applications. Limiting things to one directory wouldn't help for that.

Getting access to the whole native file system is indeed not likely something we'll be doing, but giving access to more or less arbitrarily user picked files and directories, at least for reading, is really not that different from what existing APIs already let you do today. We're still working out exact details for how we think we can do this safely, but I think we will be able to minimize the risk of users doing things accidentally and users not being aware of the data they send.

(not sure what the firefox bug is about, it seems to be a restricted bug. But it doesn't sound like it is related to this bug either).

guest271314 commented 5 years ago

@mkruisselbrink

Not sure why this issue was closed?

The Firefox bug was posted to illustrate unintended consequences/results/bugs.

All it takes is a single mistake for data to be accessed that was not intended to be accessed. In the case of user directories and files the consequences can be substantial: the user cannot recall the files accessed by web applications mistakenly.

Consider the case of Chromium configuration file being granted access, and permissions/policies are changed; . files; etc.

Again, no one can un-press "Send".

If the user is given the choice to deliberatedly select a single directory, there is less chance of mistakes and unintended consequences.

guest271314 commented 5 years ago

@mkruisselbrink "Sandboxed" is perhaps not the ideal term that should have been used. What mean is that unrestricted access to user filesystem is not necessary to achieve the goals of the proposal/API. The user can place all the files that they want outside access to in a single directory.

mkruisselbrink commented 5 years ago

Consider the case of Chromium configuration file being granted access, and permissions/policies are changed; . files; etc.

Chrome configuration files are likely going to be among the files/directories we won't grant access to via this API. Similarly other sensitive system directories will probably be blacklisted.

If the user is given the choice to deliberatedly select a single directory, there is less chance of mistakes and unintended consequences.

I'm not sure what the difference is between a user deliberately selecting a single directory, or as this API proposes having a user deliberately select multiple files/directories...

@mkruisselbrink "Sandboxed" is perhaps not the ideal term that should have been used. What mean is that unrestricted access to user filesystem is not necessary to achieve the goals of the proposal/API. The user can place all the files that they want outside access to in a single directory.

Sure, we could have an API where we have one global "shared-with-the-web" directory, and we'd only let users pick files or directories within that directory. It seems to me that such a system largely depends on this API not actually being successful for it to give any meaningful protection. If any non-trivial amount of apps used by a user end up being PWAs, all the files they operate will have to live in that directory, and thus more than likely a lot of private data will also be in that directory.

So we do want to restrict things to not allow webapps to access certain sensitive system and browser directories, but I don't think restricting things to a shared-with-the-web directory is going to give any meaningful protection on top of that. Another option could of course be separate directories for each origin, but then we get a weird asymmetry where you can use the API to share stuff between one PWA and native apps, but not between multiple PWAs (without having to move files around).

Also of course different browsers are free to make different choices for the restrictions they want to place on what access this API grants. Nothing in the API should prevent a browser from implementing the kind of sandboxing you're suggesting. It just isn't likely going to be something we in chrome are going to do.

guest271314 commented 5 years ago

@mkruisselbrink

Sure, we could have an API where we have one global "shared-with-the-web" directory, and we'd only let users pick files or directories within that directory.

Yes, private data deliberately selected by the user.

That is what am suggesting.

It seems to me that such a system largely depends on this API not actually being successful for it to give any meaningful protection.

How is that conclusion reached? This API will be successful if such a policy is mandated. At least at the outset.

If any non-trivial amount of apps used by a user end up being PWAs, all the files they operate will have to live in that directory, and thus more than likely a lot of private data will also be in that directory.

Yes, deliberately selected by the user.

To be clear, FWIW, support the proposal/API. Am merely pointing out that there are potential unintended consequences of

or even to get access to the whole native file system

for example, browser/OS/system fingerprinting and changes made by web applications to browser/OS/system that might not be immediately known to the end-user.

mkruisselbrink commented 5 years ago

FWIW I removed the "or even to get access to the whole native file system" bit from the explainer. While the API itself could do that, I would indeed hope that no browser will give websites any access like that. Perhaps special extra trusted "system" PWAs could get such access, and the API might support it, there isn't much reason to mention it as one of the things that "we're providing".

guest271314 commented 5 years ago

@mkruisselbrink Even sites that are ostensibly "trusted" by a wide range of web users/applications could have their own "agenda" or be objectively malicious, e.g., adzerk serving ads from adsafeprotected https://meta.stackoverflow.com/q/335956; i.e., are more interested in profit, targeting certain browsers/OS/systems for ads after scanning files to determine the type of browser/OS/system.

Re the Firefox bug report and unintended consequences/bugs:

User Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/52.0.2743.116 Chrome/52.0.2743.116 Safari/537.36

Steps to reproduce:

Include <input type="file"> with allowdirs attribute set, <textarea> elements at html.

Select folder from local filesystem, drag and drop folder at <textarea> element.

--

See http://stackoverflow.com/questions/40146768/how-filereader-readastext-in-html5-file-api-works/

Actual results:

The full local path of the selected folder is set as .value of <textarea>, for example

"file:///home/user/Documents/Document/MyFileFullPathDisplayedAtTextAreaValue.txt"

The full local path of the selected files is set as .value of <textarea>, for example

"file:///home/user/Documents/Document/MyFileFullPathDisplayedAtTextAreaValue1.txt" "file:///home/user/Documents/Document/MyFileFullPathDisplayedAtTextAreaValue2.txt" ..

delimited by newline character \n.

Expected results:

4.10.5.1.18. File Upload state (type=file) https://w3c.github.io/html/sec-forms.html#file-upload-state-typefile

EXAMPLE 16 https://w3c.github.io/html/sec-forms.html#example-8eea6b94

For historical reasons, the value IDL attribute prefixes the file name with the string "C:\fakepath\". Some legacy user agents actually included the full path (which was a security vulnerability). As a result of this, obtaining the file name from the value IDL attribute in a backwards-compatible way is non-trivial.


4.10.5.4. Common element APIs https://w3c.github.io/html/sec-forms.html#common-input-element-apis

filename

On getting, it must return the string "C:\fakepath\" followed by the name of the first file in the list of selected files, if any, or the empty string if the list is empty. On setting, if the new value is the empty string, it must empty the list of selected files; otherwise, it must throw an "InvalidStateError" DOMException.

NOTE: This "fakepath" requirement is a sad accident of history. See the example in the File Upload state section for more information.

NOTE: Since path components are not permitted in file names in the list of selected files, the "\fakepath\" cannot be mistaken for a path component.


4.10.5.1.18. File Upload state (type=file) https://w3c.github.io/html/sec-forms.html#file-upload-state-typefile

Path components https://w3c.github.io/html/sec-forms.html#path-components

When an element’s type attribute is in the File Upload state, the rules in this section apply.

The element represents a list of selected files, each file consisting of a file name, a file type, and a file body (the contents of the file).

File names must not contain path components, even in the case that a user has selected an entire directory hierarchy or multiple files with the same name from different directories. Path components, for the purposes of the File Upload state, are those parts of file names that are separated by U+005C REVERSE SOLIDUS character () characters.

Why exactly was requestFileSystem deprecated?

Would not requestFileSystem and the Chrome app chrome.fileSystem.chooseEntry implemented as a (to the extent possible, uniformly) Web API be sufficient?

guest271314 commented 5 years ago

@mkruisselbrink Further, the, for lack of a precise technical term, "sandboxed" directory should NOT have any hints or underlying indication of the types of the browser/OS/system that is using this API, to prevent browser/OS/system fingerprinting. The directory which users deliberately select to use for this API, should, to the extent possible, be "neutral" as to the underlying browser/OS/system.

Since the reality is that corporations generate revenue based on advertising, this API could (more than likely WILL) be used to target browsers/OSs/systems for specific advertising - if those considerations are not addressed NOW.

mkruisselbrink commented 5 years ago

Since the reality is that corporations generate revenue based on advertising, this API could (more than likely WILL) be used to target browsers/OSs/systems for specific advertising - if those considerations are not addressed NOW.

Sorry, now you really seem to be reaching. For fingerprinting/reading purposes nothing in this API is going to allow anything that isn't already possible today with <input type=file>. Nothing in this API will give any website access to any file or directory without explicit user consent (and explicit user selection of what they want to share). Furthermore (not yet mentioned in the explainer or spec), we definitely intend to not even allow third party iframes, such as the ones used by ads, to be able to use this API at all. Also use of the API will require a user gesture to be able to prompt the user to get access to anything to begin with. So yes, we will have to make sure that this can't be used to fingerprinting, and I'll make sure to address that in a future privacy considerations section of the spec, but that seems totally unrelated to any sandboxing proposal. For reading purposes this API will be pretty much identical to functionality that is already available to the web today, if anything more limited by excluding certain files/directories.

guest271314 commented 5 years ago

@mkruisselbrink Why is access to a directory other than a "sandboxed" directory, similar to requestFileSystem necessary? What are the use cases?

While in theory users cannot access the directory/file stored in the Chromium/Chrome using requestFileSystem, technically that file can be accessed and modified. How to Write in file (user directory) using JavaScript? https://stackoverflow.com/q/36098129

$ cd ~/.config/[chrome, chromium]/Default/File\ System/[three digits]/[lowercase letter]/[two digits]

$ cat [eight digits]

coupled with use of Chromium/Chrome Native Messaging the "sandboxed" file/directory could be modified right now, without using this API.

Is this proposal/API essentially requestFileSystem (which, FWIW, support)? Which goes back to the previous question as to why requestFileSystem was deprecated/not implemented by each browser? Too much baggage associated with requestFileSystem?

The one use case that have considered is a shell script (in the directory) which performed tasks when a file is written/read. While technically possible using Chromium/Chrome Native Messaging, requires some awareness of how to get a "pointer" the file/directory written by requestFileSystem stored at Chromium/Chrome configuration folder (memory) which is not currently executable how files stored in Chromium/Chrome configuration folder.

What happens when a shell script is written and executed? Which is not possible by using requestFileSystem alone?

Or, asked another way, what does this proposal/API provide that requestFileSystem as a Web API adopted by each browser would not provide? And the above question as to use cases for writing to/reading a non-"sandboxed" directory that requestFileSystem does not provide for?

guest271314 commented 5 years ago

@mkruisselbrink An additional point about the suggestion for "sandboxed" directory is that navigator.webkitTemporaryStorage.requestQuota() is useful. Consider a Raspberry Pi or other minimal system where a file is written "recursively" or ("a non-terminating procedure that happens to refer to itself.") until memory/disk space is "full" and/or "freezes"/crashes the browser.

https://github.com/thenickdude/webm-writer-js

This implementation allows you to create very large video files (exceeding the size of available memory), because it can stream chunks immediately to a file on disk using Chrome's FileWriter while the video is being constructed, instead of needing to buffer the entire video in memory before saving can begin. Video sizes in excess of 4GB can be written. The implementation currently tops out at 32GB, but this could be extended.

Reading/writing files is directly related to available RAM/disk space How to solve Uncaught RangeError when download large size json https://stackoverflow.com/q/39959467, especially after Chromium/Chrome lifted the amount of Blob data that can be stored.

taralx commented 5 years ago

@mkruisselbrink Why is access to a directory other than a "sandboxed" directory, similar to requestFileSystem necessary? What are the use cases?

Are the ones listed in the explainer not sufficient? I feel like you are not comfortable with the core idea here, but you have not articulated why your solution satisfies the cases expressed in the explainer, nor why those cases should not be supported by the web platform.

As for requestFileSystem, the answer to your question is available elsewhere (try looking at the notes in chromestatus.com for a pointer) and out of scope for this repository. It doesn't do what this proposal wants to do, and never will.

guest271314 commented 5 years ago

@taralx Actually all of the use cases listed in the explainer are already possible with or without requestFileSystem and/or using the Chromium/Chrome app chrome.fileSystem.

Am not concerned with "feelings", only the technical portion of the proposal/API.

Am not against this proposal/API. Am only asking pertinent questions. Am used to individuals and organizations not liking to be asked questions, to the degree of being banned from asking such questions, which have no concern for, either.

What am suggesting is that the proposal/API be "sandboxed", which would not apply any limitations on the proposal/API and would prevent potential (intentional) misuse/exploitation.

As yet, have not read any point in the explainer which allows tasks which requestFileSystem and chrome.fileSystem do not allow, right now. Or, for that matter, that an individual could roll their own.

Exploiting the users' filesystem using this proposal/API, for various reasons, is a real concern that should not be brushed aside as an inconvenience issue in lieu of moving forward with the proposal.

Can you address how writing to the filesystem, especially given a case of Raspberry Pi or a minimal Debian system, until RAM and available disk space can/will be prevented by this API - where the directory being written to is not "sandboxed"?

And the case of writing an executable shell script which for example, executes itself; performs cron tasks; mutates; etc., potentially without a user even being aware of what is occurring?

"Sandboxing" the directory where web applications can read/write will prevent some of those potential vectors.

guest271314 commented 5 years ago

@taralx Re chromestatus as to requestFileSystem which specific notes are you referring to?

guest271314 commented 5 years ago

@taralx Take the case of a user granting read/write access to a "home" directory at *nix. All sorts of mischief could be perpetrated.

guest271314 commented 5 years ago

@taralx What is the compelling reason to not "sandbox" the directory where read/write occurs?

mkruisselbrink commented 5 years ago

Exploiting the users' filesystem using this proposal/API, for various reasons, is a real concern that should not be brushed aside as an inconvenience issue in lieu of moving forward with the proposal.

Please be assured that we have no intention of brushing aside any such concerns as an inconvenience. We are very much aware of all the ways this can go wrong. Most of the work (unfortunately mostly internal so far) we've been doing so far has been trying to come up with a model we think can work for Chrome. If we didn't care about this we would have shipped something already.

But having said that, what kind of sandboxing a particular implementation of this API ends up doing is not really relevant for the API itself. The API will allow browsers to sandbox as much as they like. As such I might appear to be brushing aside some of your concerns, but that is mostly because they are not very relevant to this spec. Of course it is very important that implementations are secure and think about the privacy of their users, but ultimately what is done is a decision for implementer, and not really anything the spec can or will dictate.

@taralx Take the case of a user granting read/write access to a "home" directory at *nix. All sorts of mischief could be perpetrated.

That (granting access to the entire home directory) is quite likely not something we're going to go allow in the chrome implementation.

kaizhu256 commented 5 years ago

i definitely do not want to give websites arbitrary access to my ~/Downloads directory. unlike android, i use it as a unified tmp-dir for everything, including sensitive stuff.

giving www.evil.com "sandboxed" access to ~/Downloads/webfs/com.evil.www might be acceptable. even better if it lru-autocleans like indexeddb.

DanielHerr commented 5 years ago

i definitely do not want to give websites arbitrary access to my ~/Downloads directory. unlike android, i use it as a unified tmp-dir for everything, including sensitive stuff.

Then just don't select your Downloads folder in the choose dialog.

guest271314 commented 5 years ago

@DanielHerr

Then just don't select your Downloads folder in the choose dialog.

That exact language should be placed in the specification there should not be any confusion as to who is responsible for mistakes, errors, exploits. Similar to the final paragraph at queueMicrotask explainer

Risks Infinite loops of microtasks A microtask posted with queueMicrotask may itself post another microtask so of course buggy websites can create infinite loops using this API. Since these are microtasks, the current task will never complete, the page will be unresponsive and the slow script dialog will be triggered. This is different to infinite chains of setTimout calls which silently consume 100% CPU but are easier to let slip into production since the tasks complete and event handling continues as normal.

No new mitigations are proposed to handle this risk, the impact of the bug is immediately visible and the slow script dialog seems adequate. (emphasis added)

including the caveat that the bug (user error; mistake; exploit; that is PEBCAK ("Problem Exists Between Chair And Keyboard :p")) is very potentially not immediately visible to the user, and the specification authors were made aware of potential risks, though decided to not implement at least a "sandbox" option, though still assume no responsibility for user errors; mistakes; or known/unknown exploits.

Cannot emphasize enough that: No one can un-press "Send".