Permission to get file paths in order to create hyperlinks

thomas-jakemeyn commented 1 year ago

I am part of the team that works on Hackolade Studio, a desktop application for polyglot data modeling. We are in the process of porting our application to the web and we do face some challenges regarding file access. Fortunately, the File System Access API helped us tackle most of them. However, we have one main requirement that is not covered: being able to capture the path of a file that has been selected by the user.

Let me explain our use case... Hackolade Studio is a backendless application. It does not store any data and deals with data models that are persisted as local JSON files. One of its popular features is to allow data modelers to reuse parts of a data model from another data model (see user manual here for more details). Under the hood, we save the (unidirectional) link from one data model to another as a property whose value is the path of the referenced JSON file. Our approach is inspired by $ref of JSON Schema. Thanks to that property, our users are able to refresh the parts of a data model that they originally imported from other data models.

As you can see, we not only need access to the content of a selected file, but also to its path in order to create an hyperlink. That use case is probably relevant to most backendless, feature-rich editors. We are very much aware of security constraints but would be perfectly OK to request user consent prior to getting access to path information.

tomayac commented 1 year ago

At the risk of pointing out the obvious, within a directory models/ (that you can get access to via showDirectoryPicker()), you can get (recursive) access to all contained files and folders (see this article for a code sample) and thereby construct their paths. If rather than relative you need absolute paths, you could ask the user to enter via a dialog that models/ in absolute terms lives in, say, /Users/tjakemeyn/documents/models/. Would this work?

thomas-jakemeyn commented 1 year ago

Hello @tomayac, thank you for your feedback!

If I understand correctly, you suggest opening a OS-native directory picker for the user to select a parent directory that's common to all the files that (s)he plans to use to then build our own in-app file picker within that directory? Sorry to say but this looks to me like a counter-productive workaround. And asking the user to manually type the absolute path to the file that (s)he has just selected is not user-friendly at all.

May I rather ask you what are your arguments against requesting user consent to get proper path information for a file / handle?

tomayac commented 1 year ago

To be honest, this pattern works really well for apps like VS Code that use the showDirectoryPicker() method:

Their in-app file picker is a file hierarchy tree:

This pattern might work for you, too. I just wanted to point out the option.

The reason full paths are not exposed is fingerprinting (the path could happen to be unique enough to identify you), and also contain things like a user name (tjakemeyn in my example above).

tomayac commented 1 year ago

(For some past discussion on this, see https://github.com/WICG/file-system-access/issues/282.)

thomas-jakemeyn commented 1 year ago

I guess that it boils down to the notion of workspace which is not equally applicable to every application / user experience. VS Code being an integrated development environment, it feels natural to open an entire project. Moreover, a development project is typically self-contained and has therefore clear boundaries. I admit that this is probably a common use case but it is not the only one.

I am not sure to understand what additional risks we would create by exposing the path. It seems to me that the risks that you highlight are also applicable to exposing the file content. What am I missing?

tomayac commented 1 year ago

I am not sure to understand what additional risks we would create by exposing the path. It seems to me that the risks that you highlight are also applicable to exposing the file content. What am I missing?

Many people re-use the same user name on many sites, so if you find out foo.txt has the path /Users/tomayac/foo.txt, you can determine that the user's common user name is tomayac, which you can then abuse to impersonate them by registering on their behalf to services.

If you determine foo.txt has the path /Users/thomassteiner/foo.txt, with some work and a dictionary of common names, you can figure out that the user's full name is "Thomas Steiner". The user did not mean to share this information with the website.

Both scenarios have nothing to do with the contents of foo.txt. There are worse scenarios from a privacy point-of-view acknowledgedly, but still…

a-sully commented 1 year ago

Thanks for the detailed info @tomayac. Closing as a dup of #282

thomas-jakemeyn commented 1 year ago

Hello @a-sully, @tomayac,

Sorry to insist but the issue #282 is closed and it looks to me like no viable solution has been provided.

Let me recap the takeaways of our discussion:

Requesting the user consent to obtain the path of a file that (s)he has just selected could put her/his privacy at risk.
Your recommendation is to rather request access to an entire subtree and to build a custom in-app file picker.
Since that approach only works for relative paths, your recommendation to capture absolute paths is to ask the user to manually type the path to the file that (s)he has just selected.

Thank you for those suggestions but they seem:

not always applicable: not every user experience is "compatible" with the concept of workspace.
counter-productive: why building a in-app file picker if there is a standard API for it?
less secure: instead of getting access to the path of one single file, malicious code gets recursive access to all the files within a subtree.
not user-friendly: the user has to type the absolute path to a file that (s)he has just selected.
error-prone: most users will have no clue about the path that they need to type and will for sure make mistakes. Moreover, the application has no way to validate that input.

What's your perspective on this? Thank you for the discussion!

tomayac commented 1 year ago

Requesting the user consent to obtain the path of a file that (s)he has just selected could put her/his privacy at risk.

Correct, as per the outlined reasons.

Your recommendation is to rather request access to an entire subtree and to build a custom in-app file picker.

Correct, where the assumption is that said subtree would be a folder like ~/Documents/hackolade-studio/.

Since that approach only works for relative paths, your recommendation to capture absolute paths is to ask the user to manually type the path to the file that (s)he has just selected.

If, and only if, your use case absolutely requires absolute paths, then yes. I do wonder, though, if the use case of linking between data models could not be achieved with relative paths (remember that you can obtain FileSystemFileHandle objects and even store them across app sessions.

not always applicable: not every user experience is "compatible" with the concept of workspace.

Fair.

counter-productive: why building a in-app file picker if there is a standard API for it?

You might or might not need this.

less secure: instead of getting access to the path of one single file, malicious code gets recursive access to all the files within a subtree.

That's a valid concern. Oversharing is definitely a risk. As I said before, the assumption is that your app would obtain access to its own directory, ~/Documents/hackolade-studio/ in my example above. You will notice that too wide access is blocked by the API anyway, for example, you can't open ~/Documents/.

not user-friendly: the user has to type the absolute path to a file that (s)he has just selected.

If absolute paths are a fixed requirement, I agree, typing those is not a pleasant user experience.

error-prone: most users will have no clue about the path that they need to type and will for sure make mistakes. Moreover, the application has no way to validate that input.

Fair.

tomayac commented 1 year ago

Looking at your app, to the external eye it seems like simply showing another showOpenFilePicker() dialog when the highlighted button is pressed would solve the use case. For the link to work, does it actually matter where the referenced file physically is located, as long as you have a link to the FileSystemFileHandle? If you keep track of the FileSystemFileHandle objects in IndexedDB, this would also allow the "Where used" feature to work. Again, this is an outside look on your app. I'm trying to understand the requirement better.

thomas-jakemeyn commented 1 year ago

In the specific use case of Hackolade Studio, you need to add two dimensions to the problem:

We do offer both a browser app and a desktop app. If a user creates a data model containing a reference using our browser app, then (s)he should be able to open it and refresh it using our desktop app as well.
Data models can be co-edited by multiple users. Each of them should be able to open and refresh those data models.

I am under the impression that our need for cross-app, cross-user portability disqualifies the option of relying exclusively on a file handle that would be persisted only in the browser of the data model's author.

Also, we already persist in IndexedDB the file handles that we acquire in order to make it possible for our users to reopen recent data models. The problem is that, without the path, we have no other choice than indexing them by file names. As a consequence, if a user opens two data models with the same file name, the most recent one overrides the other. This is not a big problem as such but is a symptom of the lack of access to the path in my opinion.

tomayac commented 1 year ago

In the specific use case of Hackolade Studio, you need to add two dimensions to the problem:

We do offer both a browser app and a desktop app. If a user creates a data model containing a reference using our browser app, then (s)he should be able to open it and refresh it using our desktop app as well.

Data models can be co-edited by multiple users. Each of them should be able to open and refresh those data models.

I am under the impression that our need for cross-app, cross-user portability disqualifies the option of relying exclusively on a file handle that would be persisted only in the browser of the data model's author.

I agree with your analysis. Given these constraints, you're out of luck unfortunately. Not sure if @a-sully has other ideas maybe?

Also, we already persist in IndexedDB the file handles that we acquire in order to make it possible for our users to reopen recent data models. The problem is that, without the path, we have no other choice than indexing them by file names. As a consequence, if a user opens two data models with the same file name, the most recent one overrides the other. This is not a big problem as such but is a symptom of the lack of access to the path in my opinion.

We have recognized this problem, and there's a proposal for FileSystemHandle.getUniqueId() in https://github.com/whatwg/fs/pull/46.

a-sully commented 1 year ago

Also, we already persist in IndexedDB the file handles that we acquire in order to make it possible for our users to reopen recent data models. The problem is that, without the path, we have no other choice than indexing them by file names. As a consequence, if a user opens two data models with the same file name, the most recent one overrides the other. This is not a big problem as such but is a symptom of the lack of access to the path in my opinion.

We have recognized this problem, and there's a proposal for FileSystemHandle.getUniqueId() in whatwg/fs#46.

Yeah it seems like the proposed method would address your use case. It would provide what's effectively just a hash of the file path. So two entries of the same name could be de-duped

thomas-jakemeyn commented 1 year ago

Thank you for the pointer, I'll definitely follow up on https://github.com/whatwg/fs/pull/46.

@a-sully, do you have any suggestion regarding the support for the first requirement?

I agree with your analysis. Given these constraints, you're out of luck unfortunately. Not sure if @a-sully has other ideas maybe?

tomayac commented 1 year ago

The manual workflow still is an option. The user fills in that what your app see as something.hck.json is actually located at ~/Documents/data/. It's not convenient, but it would work.

thomas-jakemeyn commented 1 year ago

And what about offering a way to get the absolute path of a file and just obfuscate the name of the home directory by replacing it with something like ~? This would mitigate privacy concerns in most cases...

tomayac commented 1 year ago

This still leaves the fingerprinting vector open where you look for unique folder names somewhere in the path. Here's why this can be problematic:

Assume a folder structure as follows: ~/stuff/2023/backup-2023-03-31-11_42_00/thing.foo.
Now assume you open the thing.foo file on https://awesome-editor.example.com and then on https://brilliant-editor.example.org.
Assume further both sites belong to the same entity (like in the physical world Saturn and MediaMarkt, which is not known to many lay persons).
The collaborating sites can look for common patterns in the paths of the files they see people open on both sites, and if they notice matches, they can use this plus other information like the IP address or other fingerprinting vectors to determine it's the same user.

Very contrived example, acknowledgedly, and the sites could also (or in addition) just compare the file contents, but I hope it makes the situation clear.

thomas-jakemeyn commented 1 year ago

Thank you for the detailed explanation, it is very clear! However, the more we discuss about it, the more I have the perception that exposing the path would not create additional risks for the user's privacy.

As you highlighted, comparing file contents, user agents, IP addresses, etc. is already enough to guess that it's the same user (assuming that (s)he is not authenticated on both sites with the same email address anyway). Moreover, https://github.com/whatwg/fs/pull/46 just adds one additional evidence. To be honest, if the name of the home folder is somehow obfuscated (e.g. replaced by ~), I don't get why exposing the path would be more risky than exposing such a unique identifier...

a-sully commented 1 year ago

There's a long history of debate on this topic :) Stepping back from the privacy implications, there's a broader question here: what does the user expect is happening when selecting a file from a file picker?

My mental model of a user's mental model is that selecting a file from the file picker grants access to the file, not to the context in which the file finds itself. Selecting a file from the picker is a strong indication that the user wants to give the site access to the file contents, but they likely don't expect the site to know whether the file lives in Documents, Downloads, or elsewhere

I understand this presents a challenge for your specific use case. I wonder if it might be possible to add a layer of indirection? Rather than mapping directly to a file path, you could map to some opaque identifier, which then maps to e.g. a file path, a URL, or a unique ID returned by https://github.com/whatwg/fs/pull/46

WICG / file-system-access

Permission to get file paths in order to create hyperlinks #407