Closed thomas-jakemeyn closed 1 year ago
At the risk of pointing out the obvious, within a directory models/
(that you can get access to via showDirectoryPicker()
), you can get (recursive) access to all contained files and folders (see this article for a code sample) and thereby construct their paths. If rather than relative you need absolute paths, you could ask the user to enter via a dialog that models/
in absolute terms lives in, say, /Users/tjakemeyn/documents/models/
. Would this work?
Hello @tomayac, thank you for your feedback!
If I understand correctly, you suggest opening a OS-native directory picker for the user to select a parent directory that's common to all the files that (s)he plans to use to then build our own in-app file picker within that directory? Sorry to say but this looks to me like a counter-productive workaround. And asking the user to manually type the absolute path to the file that (s)he has just selected is not user-friendly at all.
May I rather ask you what are your arguments against requesting user consent to get proper path information for a file / handle?
To be honest, this pattern works really well for apps like VS Code that use the showDirectoryPicker()
method:
Their in-app file picker is a file hierarchy tree:
This pattern might work for you, too. I just wanted to point out the option.
The reason full paths are not exposed is fingerprinting (the path could happen to be unique enough to identify you), and also contain things like a user name (tjakemeyn
in my example above).
(For some past discussion on this, see https://github.com/WICG/file-system-access/issues/282.)
I guess that it boils down to the notion of workspace which is not equally applicable to every application / user experience. VS Code being an integrated development environment, it feels natural to open an entire project. Moreover, a development project is typically self-contained and has therefore clear boundaries. I admit that this is probably a common use case but it is not the only one.
I am not sure to understand what additional risks we would create by exposing the path. It seems to me that the risks that you highlight are also applicable to exposing the file content. What am I missing?
I am not sure to understand what additional risks we would create by exposing the path. It seems to me that the risks that you highlight are also applicable to exposing the file content. What am I missing?
Many people re-use the same user name on many sites, so if you find out foo.txt
has the path /Users/tomayac/foo.txt
, you can determine that the user's common user name is tomayac
, which you can then abuse to impersonate them by registering on their behalf to services.
If you determine foo.txt
has the path /Users/thomassteiner/foo.txt
, with some work and a dictionary of common names, you can figure out that the user's full name is "Thomas Steiner". The user did not mean to share this information with the website.
Both scenarios have nothing to do with the contents of foo.txt
. There are worse scenarios from a privacy point-of-view acknowledgedly, but still…
Thanks for the detailed info @tomayac. Closing as a dup of #282
Hello @a-sully, @tomayac,
Sorry to insist but the issue #282 is closed and it looks to me like no viable solution has been provided.
Let me recap the takeaways of our discussion:
Thank you for those suggestions but they seem:
What's your perspective on this? Thank you for the discussion!
- Requesting the user consent to obtain the path of a file that (s)he has just selected could put her/his privacy at risk.
Correct, as per the outlined reasons.
- Your recommendation is to rather request access to an entire subtree and to build a custom in-app file picker.
Correct, where the assumption is that said subtree would be a folder like ~/Documents/hackolade-studio/
.
- Since that approach only works for relative paths, your recommendation to capture absolute paths is to ask the user to manually type the path to the file that (s)he has just selected.
If, and only if, your use case absolutely requires absolute paths, then yes. I do wonder, though, if the use case of linking between data models could not be achieved with relative paths (remember that you can obtain FileSystemFileHandle
objects and even store them across app sessions.
- not always applicable: not every user experience is "compatible" with the concept of workspace.
Fair.
- counter-productive: why building a in-app file picker if there is a standard API for it?
You might or might not need this.
- less secure: instead of getting access to the path of one single file, malicious code gets recursive access to all the files within a subtree.
That's a valid concern. Oversharing is definitely a risk. As I said before, the assumption is that your app would obtain access to its own directory, ~/Documents/hackolade-studio/
in my example above. You will notice that too wide access is blocked by the API anyway, for example, you can't open ~/Documents/
.
- not user-friendly: the user has to type the absolute path to a file that (s)he has just selected.
If absolute paths are a fixed requirement, I agree, typing those is not a pleasant user experience.
- error-prone: most users will have no clue about the path that they need to type and will for sure make mistakes. Moreover, the application has no way to validate that input.
Fair.
Looking at your app, to the external eye it seems like simply showing another showOpenFilePicker()
dialog when the highlighted button is pressed would solve the use case. For the link to work, does it actually matter where the referenced file physically is located, as long as you have a link to the FileSystemFileHandle
? If you keep track of the FileSystemFileHandle
objects in IndexedDB, this would also allow the "Where used" feature to work. Again, this is an outside look on your app. I'm trying to understand the requirement better.
In the specific use case of Hackolade Studio, you need to add two dimensions to the problem:
I am under the impression that our need for cross-app, cross-user portability disqualifies the option of relying exclusively on a file handle that would be persisted only in the browser of the data model's author.
Also, we already persist in IndexedDB the file handles that we acquire in order to make it possible for our users to reopen recent data models. The problem is that, without the path, we have no other choice than indexing them by file names. As a consequence, if a user opens two data models with the same file name, the most recent one overrides the other. This is not a big problem as such but is a symptom of the lack of access to the path in my opinion.
In the specific use case of Hackolade Studio, you need to add two dimensions to the problem:
- We do offer both a browser app and a desktop app. If a user creates a data model containing a reference using our browser app, then (s)he should be able to open it and refresh it using our desktop app as well.
- Data models can be co-edited by multiple users. Each of them should be able to open and refresh those data models.
I am under the impression that our need for cross-app, cross-user portability disqualifies the option of relying exclusively on a file handle that would be persisted only in the browser of the data model's author.
I agree with your analysis. Given these constraints, you're out of luck unfortunately. Not sure if @a-sully has other ideas maybe?
Also, we already persist in IndexedDB the file handles that we acquire in order to make it possible for our users to reopen recent data models. The problem is that, without the path, we have no other choice than indexing them by file names. As a consequence, if a user opens two data models with the same file name, the most recent one overrides the other. This is not a big problem as such but is a symptom of the lack of access to the path in my opinion.
We have recognized this problem, and there's a proposal for FileSystemHandle.getUniqueId()
in https://github.com/whatwg/fs/pull/46.
Also, we already persist in IndexedDB the file handles that we acquire in order to make it possible for our users to reopen recent data models. The problem is that, without the path, we have no other choice than indexing them by file names. As a consequence, if a user opens two data models with the same file name, the most recent one overrides the other. This is not a big problem as such but is a symptom of the lack of access to the path in my opinion.
We have recognized this problem, and there's a proposal for
FileSystemHandle.getUniqueId()
in whatwg/fs#46.
Yeah it seems like the proposed method would address your use case. It would provide what's effectively just a hash of the file path. So two entries of the same name could be de-duped
Thank you for the pointer, I'll definitely follow up on https://github.com/whatwg/fs/pull/46.
@a-sully, do you have any suggestion regarding the support for the first requirement?
I agree with your analysis. Given these constraints, you're out of luck unfortunately. Not sure if @a-sully has other ideas maybe?
The manual workflow still is an option. The user fills in that what your app see as something.hck.json
is actually located at ~/Documents/data/
. It's not convenient, but it would work.
And what about offering a way to get the absolute path of a file and just obfuscate the name of the home directory by replacing it with something like ~
? This would mitigate privacy concerns in most cases...
This still leaves the fingerprinting vector open where you look for unique folder names somewhere in the path. Here's why this can be problematic:
~/stuff/2023/backup-2023-03-31-11_42_00/thing.foo
.thing.foo
file on https://awesome-editor.example.com
and then on https://brilliant-editor.example.org
.Very contrived example, acknowledgedly, and the sites could also (or in addition) just compare the file contents, but I hope it makes the situation clear.
Thank you for the detailed explanation, it is very clear! However, the more we discuss about it, the more I have the perception that exposing the path would not create additional risks for the user's privacy.
As you highlighted, comparing file contents, user agents, IP addresses, etc. is already enough to guess that it's the same user (assuming that (s)he is not authenticated on both sites with the same email address anyway). Moreover, https://github.com/whatwg/fs/pull/46 just adds one additional evidence. To be honest, if the name of the home folder is somehow obfuscated (e.g. replaced by ~
), I don't get why exposing the path would be more risky than exposing such a unique identifier...
There's a long history of debate on this topic :) Stepping back from the privacy implications, there's a broader question here: what does the user expect is happening when selecting a file from a file picker?
My mental model of a user's mental model is that selecting a file from the file picker grants access to the file, not to the context in which the file finds itself. Selecting a file from the picker is a strong indication that the user wants to give the site access to the file contents, but they likely don't expect the site to know whether the file lives in Documents, Downloads, or elsewhere
I understand this presents a challenge for your specific use case. I wonder if it might be possible to add a layer of indirection? Rather than mapping directly to a file path, you could map to some opaque identifier, which then maps to e.g. a file path, a URL, or a unique ID returned by https://github.com/whatwg/fs/pull/46
I am part of the team that works on Hackolade Studio, a desktop application for polyglot data modeling. We are in the process of porting our application to the web and we do face some challenges regarding file access. Fortunately, the File System Access API helped us tackle most of them. However, we have one main requirement that is not covered: being able to capture the path of a file that has been selected by the user.
Let me explain our use case... Hackolade Studio is a backendless application. It does not store any data and deals with data models that are persisted as local JSON files. One of its popular features is to allow data modelers to reuse parts of a data model from another data model (see user manual here for more details). Under the hood, we save the (unidirectional) link from one data model to another as a property whose value is the path of the referenced JSON file. Our approach is inspired by $ref of JSON Schema. Thanks to that property, our users are able to refresh the parts of a data model that they originally imported from other data models.
As you can see, we not only need access to the content of a selected file, but also to its path in order to create an hyperlink. That use case is probably relevant to most backendless, feature-rich editors. We are very much aware of security constraints but would be perfectly OK to request user consent prior to getting access to path information.