goatcorp / DIPs

Dalamud Improvement Proposals
Other
9 stars 8 forks source link

DIP59: Lockdown Mode #59

Closed Minoost closed 1 year ago

Minoost commented 1 year ago

Rendered (Markdown)

KazWolfe commented 1 year ago

How would this affect the multiboxing mutexes currently used by the game? If I understand this subsystem correctly, the game uses a mutex to prevent launching multiple (>2?) copies of the game simultaneously. At least per this article it seems as though the mutex would be named differently and therefore not necessarily behave how we'd expect.

Would it be possible, then, for a user to launch more than two copies of the game at once through XL? Or two copies through XL and two copies through the retail launcher? Do we care about this edge case?

I also feel that the link system needs to be resolved somehow before we can move this to prod. The game itself calls ShellExecuteA for URLs (easy example: System Settings > Sound > Spatial Audio), which are broken in Lockdown Mode at present. I'm unsure how best to resolve this particular issue - would it make sense to hook ShellExecuteA or the calling method at +16DE30? Is it in general a good idea to (intentionally) break a game component and then subsequently patch it?

Minoost commented 1 year ago

@goaaats @KazWolfe

That's so many questions I have to answer at once so let's break it down into bite-size chunks. (hope I didn't miss anything important.)

@goaaats Another option would be to just have pre-allocated storage folders for plugins(see iPhone file manager), and scope the Dalamud file-manager accordingly. This would be a bit annoying though and would likely not be an option for Penumbra, since it doesn't like long paths.

Out of scope?

I want to take this opportunity make something clear: per-plugin isolation is not the goal of this DIP. This DIP aims to protect the system (i.e. outside of FFXIV) from a hostile plugin. (which can theoretically contain something like a ransomware, for example.) However, it won't protect a plugin stealing your FFXIV session token, gils, items, etc.

@goaaats A big pending question for me is FS access - it would be nice to have the "broker process" handle FS grants that will allow plugins to request ACLs for other directories, based on an external UI handled by the broker process. This is important for e.g. Penumbra and creative plugins that allow for saving/loading arbitrary files.

@KazWolfe What kind of overhead would we be looking at for a broker, and what would the broker's limitations be? (And, more importantly, how will we prevent exploitation of the broker?)

In terms of runtime overhead it'd be pretty much negligible. Most of the time the broker would (1) read a message via IPC channel (2) and verify it's something allowed (3) then open a handle for them (4) and finally copy the handle into the sandboxed process. Actual I/O or heavy-duty operations would happen inside the sandboxed process.

Yes, because the broker is a privileged process (from the sandboxed process perspective) and acting as a gateway to the system "broker" is something you really make sure it can't be abused to escape the container. This generally means you really shouldn't do "dump anything what sandboxed processes can't do on the broker because we can" and anyone having such attitudes seriously shouldn't make changes to the broker.

However, whatever it may happen to the broker this is still a massive improvement over "everything is free real estate" approach. (actually countable # of IPC calls vs. virtually infinite)

@KazWolfe Related: if a plugin needs "scratch area" (temp files), how should they go about setting that up? I assume we can't use $TEMP anymore with these changes, so will we just need to use our own plugin folders?

How does this affect registry usage? At present there are no (known) plugins that use registry, and there probably shouldn't ever be, but I think specifically denoting that this will break registry access (if it does?) would be beneficial so that it doesn't become a gotcha in the future.

Of course virtually unlimited access to registry is going away while running inside the AC.

I only talked about file usage but most kernel objects (including registry, named pipe, mutex, etc...) follows the same rule: (Assuming Windows already checked IL) If an object in question has an ACE that explicitly allows either AppContainer SID or ALL APPLICATION PACKAGES SID to access, then the access will be granted. If doesn't, then it will be denied.

It's important that apps can access whatever ALL APPLICATION PACKAGES SID can do. As otherwise you won't be able to load core Windows modules like DLLs inside C:\Windows\System32.

This rule also applies to the registry: Currently Dalamud.Injector doesn't touch DACLs on HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer but it has ALL APPLICATIONS PACKAGES ACE that explicitly allows read operation so you can read those and its subkeys.

Last thing I want to talk about is that each AppContainers gets their own private directory on %localappdata%/Packages/$container_id/AC (with full access to it) and certain directories (including $TEMP) are redirected into it.

Kernel object equivalent of this "private directory" (e.g. for mutexes) is located at \Sessions\$session_id\AppContainerNamedObjects\$appcontainer_sid.

If you're concerned with specific use cases then please tell me more about the scenarios.

@KazWolfe As alluded to in another comment, what exactly does this capability do? Or, more accurately, what's the difference between this and SECURITY_CAPABILITY_INTERNET_CLIENT?

One of the devs checking in that uses localhost access for a port - this is essential to the functionality of my plugin, at least short of a full rewrite to use named pipes or similar. I'm opening a socket against 127.0.0.1/[::1] on an arbitrary (user-defined) port above 1024. Assuming that localhost access (especially listens) are blocked or restricted, I will need some alternate way of getting data in/out of the game. To that end:

  • What changes would be necessary to keep my ability to host this port? Or will I need to completely re-architect for compliance with DIP59?
  • How will Lockdown Mode affect the ability to create or use named pipes and UNIX sockets?

If possible, I'd also like to get clarity on what SECURITY_CAPABILITY_INTERNET_CLIENT_SERVER and SECURITY_CAPABILITY_PRIVATE_NETWORK_CLIENT_SERVER actually do. As it stands, MSDN's documentation is rather unclear as to what these mean, so it's hard to figure out what these permissions grant (does INTERNET_CLIENT_SERVER allow listening to 0.0.0.0 and accepting connections from outside the local network? If so, we may want to disallow that...).

It's very unfortunate that there are no official documents about AppContainer with sufficient details online. Afaik it's only in the book "Windows Internals" published by Microsoft themselves.

However, you can read a blog post (I already linked this in "Useful Resource" section) from Google Project Zero to see what's the difference between SECURITY_CAPABILITY_INTERNET_CLIENT_SERVER and SECURITY_CAPABILITY_INTERNET_CLIENT (and its home network variants) in details.

The gist is:

(Funny that there are no differences between *_INTERNET_*and *_PRIVATE_NETWORK_* if it's set to public. It's as if "You're not at home anyway so whatever lol")

The blog post has enough details so you should be able to check IPv6 rules by yourself. (I'm not going to post it in here.)

img

Access to localhost is always going to stinky as there's possibility that untrusted process can just connect to unauthenticated services[^1] then exploit its capability to escape the container.

We (and of course, Microsoft) don't have a control over what's installed in user's system so AC just blanket bans on all localhost communications.

Currently we don't do anything about it and follows the default rule. (Ban all localhost communications[^2]) Later this might be changed but even then it will be "only if you turn this on this setting" kind of thing.

You CAN however, communicate with an external services[^3] via named pipes if the service in question explicitly allows it. (Add ACEs with ALL APPLICATION PACKAGES or specific package SID with appropriate permissions.) Just be extra careful as clients connect to the service can be actively hostile. (Just like any internet facing services.)

If the process inside AC is the host, then create it inside the private space I briefly mentioned above.

[^1]: Services that thinks it's safe from an attacker when binds a socket to localhost only. [^2]: Actually, you can recv/send stuff from/to localhost if it's within same AC. Read the blog post from Google if you're interested. [^3]: Any process live outside of the container, really.

KazWolfe commented 1 year ago

Thank you, this all makes sense. I'll just tack the responses on to responses:

Kernel object equivalent of this "private directory" (e.g. for mutexes) is located at \Sessions\$session_id\AppContainerNamedObjects\$appcontainer_sid.

I would like to see this tested to see how it affects multiboxing, as it sounds like it would at present. If this is something we're concerned about allowing, then we may need to have something in place elsewhere to ensure that no more than two instances of the game can be running at any given time. I'll defer to Goat as to whether this is an issue or not, but I feel like we shouldn't be allowing that here.

Currently we don't do anything about it and follows the default rule. (Ban all localhost communications Later this might be changed but even then it will be "only if you turn this on this setting" kind of thing.

If the process inside AC is the host, then create it inside the private space I briefly mentioned above.

This is what I was afraid of, and will require either a complete rearchitecture of my plugin or the ability to read the status of whatever localhost comms checkbox we include (if we even decide to allow such a thing). As such, I'd like to request that if we move forward with this proposal, that ample time be given for developers to manually enable and verify compatibility with lockdown mode before we start shipping this default for everyone.

Are paths to the private space(s) in any way predictable? In other words, if I create a named pipe from the game process, how will anything outside the game even know where to look? The same would apply, I suppose, to SIDs as well. You'd be correct that I'd like to restrict named pipes as much as I can, but I need to know the parameters of what to restrict those to and how those would behave.

Obviously, the breakage of XIVDeck should not be a blocker, as the overall security of the plugin ecosystem is more important than a single highly-specialized plugin, but I'd like to be as ready for these changes as possible especially given I have virtually zero understanding of AppContainer. If XIVDeck dies as a result of this, then so be it.

KazWolfe commented 1 year ago

I also feel that the link system needs to be resolved somehow before we can move this to prod. The game itself calls ShellExecuteA for URLs (easy example: System Settings > Sound > Spatial Audio), which are broken in Lockdown Mode at present. I'm unsure how best to resolve this particular issue - would it make sense to hook ShellExecuteA or the calling method at +16DE30? Is it in general a good idea to (intentionally) break a game component and then subsequently patch it?

This was discussed with Goat in the Discord and we came to the conclusion that ShellExecuteA would be hooked and handle links... however we'd otherwise handle that case for plugins. I'm still not particularly a fan of this solution, but I suppose it's unavoidable if we want to go down this path.

Minoost commented 1 year ago

I would like to see this tested to see how it affects multiboxing, as it sounds like it would at present. If this is something we're concerned about allowing, then we may need to have something in place elsewhere to ensure that no more than two instances of the game can be running at any given time. I'll defer to Goat as to whether this is an issue or not, but I feel like we shouldn't be allowing that here.

Untitled2

Because FFXIV creates a mutex on Global namespace (which of course the sandboxed process don't have access to and thus failing multibox checking), I've patched out multibox checking code for now.

Redirecting this to Local namespace and thus allowing 4 instances to run at most (2 for with AC + 2 for without AC) seems to be acceptable trade off.

Are paths to the private space(s) in any way predictable? In other words, if I create a named pipe from the game process, how will anything outside the game even know where to look? The same would apply, I suppose, to SIDs as well. You'd be correct that I'd like to restrict named pipes as much as I can, but I need to know the parameters of what to restrict those to and how those would behave.

Let's say some plugin created a named pipe like this:

var pipeServer = new NamedPipeServerStream(@"Local\HelloDalamudContainer");
// ...
// pipeServer.WaitForConnectionAsync()
// pipeServer.ReadAsync(someBuffer)
// ...

If a process outside the game wants to connect to this pipe, the process outside the game need to do something like this:

// See https://aka.ms/new-console-template for more information

using static System.Console;
using System.ComponentModel;
using System.Diagnostics;
using Windows.Win32;
using Windows.Win32.Foundation;
using System.IO.Pipes;

Debug.Assert(OperatingSystem.IsOSPlatformVersionAtLeast("windows", 8));

// https://learn.microsoft.com/en-us/windows/win32/devnotes/getting-the-session-id-of-the-current-process
uint GetSessionId()
{
    var ok = PInvoke.ProcessIdToSessionId(PInvoke.GetCurrentProcessId(), out var sessionId);
    if (!ok)
        throw new Win32Exception();

    return sessionId;
}

string GetNamedObjectPath(string appContainerId)
{
    unsafe
    {
        PSID psid = default;
        Span<char> path = stackalloc char[260];

        fixed (char* pPath = path)
        {
            try
            {
                var hresult = PInvoke.DeriveAppContainerSidFromAppContainerName(appContainerId, out psid);
                hresult.ThrowOnFailure();

                var ok = PInvoke.GetAppContainerNamedObjectPath(null, psid, (uint) path.Length, pPath,
                    out var pathLength);
                if (!ok)
                    throw new Win32Exception();

                return new string(pPath);
            }
            finally
            {
                if ((void*) psid != null)
                    PInvoke.FreeSid(psid);
            }
        }
    }
}

string GetAppContainerSidString(string appContainerId)
{
    unsafe
    {
        PSID psid = default;
        PWSTR psidStr = default;

        try
        {
            var hresult = PInvoke.DeriveAppContainerSidFromAppContainerName(appContainerId, out psid);
            hresult.ThrowOnFailure();

            var ok = PInvoke.ConvertSidToStringSid(psid, out psidStr);
            if (!ok)
                throw new Win32Exception();

            return psidStr.AsSpan().ToString();
        }
        finally
        {
            if ((void*) psid != null)
                PInvoke.FreeSid(psid);

            if ((void*) psidStr != null)
                PInvoke.LocalFree((IntPtr) psidStr.Value);
        }
    }
}

// Main
var sessionId = GetSessionId();
var containerSid = GetAppContainerSidString("dalamud.container");
var objectPath = GetNamedObjectPath("dalamud.container");

WriteLine($"session id: {sessionId}");
WriteLine($"container sid (for debugging): {containerSid}");
WriteLine($"obj path: {objectPath}");

WriteLine($@"Inside of FFXIV:  Local\HelloDalamudContainer1234");
WriteLine($@"Outside of FFXIV: Sessions\{sessionId}\{objectPath}\HelloDalamudContainer1234");

var pipePath = $@"Sessions\{sessionId}\{objectPath}\HelloDalamudContainer1234";

using var pipe = new NamedPipeClientStream(pipePath);
pipe.Connect(/* timeout */);

// `NamedPipeClientStream` inherits `Stream`.
pipe.Write(new byte[] { 0x01, 0x02, 0x03, 0x04, 0x05 });

(maybe there's a better way to handle this...)

Untitled

I also feel that the link system needs to be resolved somehow before we can move this to prod. The game itself calls ShellExecuteA for URLs (easy example: System Settings > Sound > Spatial Audio), which are broken in Lockdown Mode at present. I'm unsure how best to resolve this particular issue - would it make sense to hook ShellExecuteA or the calling method at +16DE30? Is it in general a good idea to (intentionally) break a game component and then subsequently patch it

I wonder how Launcher.LaunchUriAsync handles this. Given there are restrictions on what can be passed to Uri, maybe it's handled by the broker?

KazWolfe commented 1 year ago

Thank you for the sample code - I think that resolves my personal issue (although it will still involve a pain of rearchitecting everything, but oh well). Moving on:

I wonder how Launcher.LaunchUriAsync handles this. Given there are restrictions on what can be passed to Uri, maybe it's handled by the broker?

This seems acceptable to me, provided we don't just actually use LaunchUriAsync (assuming we can?). We'd need to ensure that whatever we do remains compatible with Wine as well though. I'd also have questions about how the user experience of this will act (will users receive a pop-up asking to open a URL?), but again not particularly relevant immediately. For now I think the assumption that ShellExecuteA would only be used for URLs is valid, but we should probably have something at least in the back of our minds in case Square decides to do more/launch extra processes.

Redirecting this to Local namespace and thus allowing 4 instances to run at most (2 for with AC + 2 for without AC) seems to be acceptable trade off.

No objections to limiting this to four, just want this out in the open so others can discuss appropriately. We can also have other things handle this - e.g. the broker holds the (native) mutex open as long as the game is alive.


Overall, I support this proposition, though I have concerns about how much this breaks and for what purpose we're causing breakage. I'll always agree that more security is beneficial, but I do want to point out (what I see as) our risk model here: third party plugins and those that load in information from untrusted sources.

All first-party plugins are already reviewed and Lockdown Mode won't change that. This PR itself denotes this as a limitation: plugins can still be malicious in such a way that impacts or destroys a user's game. A plugin like that shouldn't ever really be in the official repository anyways, and we can't stop that so long as plugins have full access to FFXIV's process space. It's true that our reviewers may not necessarily be able to spot suspicious or well-hidden security flaws/malware that would cause harm to a machine, but anything like that would already probably look odd or out of place in the context of an FFXIV plugin.

That leaves two notable threat surfaces that Lockdown Mode would protect against:

This DIP, overall, carries a lot of potential to cause breaking changes. Some of these breaking changes are simple; e.g. changing file cache paths to more appropriate places or altering how certain behaviors work to better comply with a sandbox. Others may fundamentally require plugins rethink their approach: how/where to store and load files, how to perform inter-process communication, how to allow users to export data, and a whole host of other things.

None of these are bad, necessarily, but I think we as a community need to decide whether these security benefits are worth the cost we'd have to pay in terms of rewriting plugins/supporting users during the transition/making certain documentation very clear on the user's responsibility to comply with the sandbox. Doubly so if we're doing this while a user can just download some malware in the form of an ACT plugin that promises to make your parses better.

In short: if this becomes an enabled-by-default feature, are the security benefits material enough to outweigh the costs? We aren't a web browser where untrusted information exists in vastly higher quantities; virtually everything done in the XIV plugin ecosystem is done with a degree of intent and (hopefully) foreknowledge as to the risks.


I think before moving forward, we should have a (documented) idea on how to solve a lot of the common problems:

Again: all the underlying principles seem sound and these security improvements are not without merit. It just becomes a question of dragging kicking and screaming users/devs (like myself) into believing that the improvements are worth the support burden/change in how everything operates.

Minoost commented 1 year ago

This seems acceptable to me, provided we don't just actually use LaunchUriAsync (assuming we can?)

Oh wow, I haven't thought about that. LaunchUriAsync works just fine.

(Not sure about )

https://user-images.githubusercontent.com/1381835/211057031-666ed517-3b24-4d51-9e2a-8b5eab4002c2.mp4

This does bring the minimum OS requirement to Windows 10 but Windows 8.1 EoL is just 3 days away from now so it's probably fine.

(I have no idea how it will work on Linux however..)

Minoost commented 1 year ago

Looks like community's general consensus for this is "We don't need it" - ranging from legitimate concerns like ergonomic and maintainability issues to "we didn't get hacked before, so why bother?" and some downright aggressive and threaten to pull the plug if this feature were to be accepted. First round of such discussion apparently has happened on early January and in a private channel on Discord.

As such, I don't believe this needs be developed any further nor worth the time anymore.

As a side note, if anyone wants to keep working on this concept you're free to do so. Only task left to launch the game within the container is patching out/inserting shims for multi-box and PROCESS_VM_WRITE checks - that part should be easy enough.