USD running as different user on Windows?

bsilvaLS commented 5 years ago

Description of Issue

Hello! We're having issues with permissions on Windows when working on network drives. It appears that USD is running as a different user, probably a system user.

Steps to Reproduce

Using command prompt, change directory to a network drive location, where I have full-control permissions but Everyone does not.

Open a Python shell and create a new stage:

>>> from pxr import Usd, UsdGeom
>>> stage = Usd.Stage.CreateNew('HelloWorld.usda')

A traceback occurs:

Traceback (most recent call last): File "", line 1, in pxr.Tf.ErrorException: Error in 'pxrInternal_v0_19pxrReserved::SdfTextFileFormat::WriteToFile' at line 305 in file <...>\pxr\usd\lib\sdf\textFileFormat.cpp : 'Insufficient permissions to write to destination directory '<...>''

This will also happen with safeOutputFile (HelloWorld.usdc).

If I change the directory permissions so Everyone has full permissions, it will write the file without complaint.

This occurs when running from within Maya as well, and trying to use the AL_USDMaya export command, since it uses stage->GetRootLayer()->Save()

Notably, it does NOT occur when using the built-in usdExport command in Maya -- that works fine and will write the file.

Any help appreciated!

System Information (OS, Hardware)

Windows 10.0.16299 Build 16299, x64 Intel(R) Core(TM) i9-7940X CPU

Package Versions

USD 19.05 AL_USDMaya 0.31.1

Build Flags

c64kernal commented 5 years ago

Hi @bsilvaLS, USD doens't attempt to run as a different user... though what our text file format does upon writing is that it first tries to create a temporary file, write to that temporary file completely, and if successful, moves it to the file you said to write to. The reason we do this is so that if the save fails for any reason, it wouldn't corrupt the existing file.

My suspicion is that your user doesn't have permission to create and write to this temporary file.

Hope that helps!

bsilvaLS commented 5 years ago

Thanks @c64kernal ! Is that temporary file written in place in the current directory, or in a temp folder elsewhere? This happens when I'm running within a folder I have full control permissions (which is the same as the destination directory listed in the error message).

And this also happens when writing a binary USD, does it write to a temporary file in the same way?

jtran56 commented 5 years ago

Filed as internal issue #USD-5287.

c64kernal commented 5 years ago

Hmm... okay there may still be a bug for sure. I'd suspect it would be around this code: https://github.com/PixarAnimationStudios/USD/blob/dev/pxr/base/lib/tf/atomicRenameUtil.cpp#L125 If you were able to debug around this area of the code, we may get an answer for why you're getting these errors.

(For the binary version, the answer isn't quite so simple -- but yes, there are cases where we do need access to a temporary file -- do you get the same problem with binary files too?)

bsilvaLS commented 5 years ago

Yeah, that's a likely location -- I'll see what I can glean if I'm able to get in there.

I do get the same issue using a binary file -- i.e. if I use 'HelloWorld.usdc' as the new stage.

swamiforlife commented 4 years ago

Hello bsilvaS, I am having the same issue where i cannot save USD files to network drives. Have you found a solution for this?

swamiforlife commented 3 years ago

I am still not able to write usd files over Network drives in windows? Any solutions for this problem?

meshula commented 3 years ago

Out of curiosity, @swamiforlife, is your network drive with the failures an SMB mount, with poorly emulated permissions? Git as an example, doesn't work well with SMB mounts on any of the Western Digital NAS I own.

swamiforlife commented 3 years ago

I have a windows server 2016 file server. The permissions seem to be fine because everything works fine only with USD saving files over the network there is a problem. Over in the Houdini forums there are people have the same issues writing over the network with USD. In those forums they think that there bugs in the USD code but they can't find the problem. https://www.sidefx.com/forum/topic/71754/ https://www.sidefx.com/forum/topic/70960/

meshula commented 3 years ago

Thanks for the links to the Houdini forums, here's the corresponding usd-interest thread. https://groups.google.com/g/usd-interest/c/Kf4nV2QdoLg/m/jtGSsfCeAQAJ - unfortunately it seems that no one with reproduction case has made progress with a debugger attached to identify where the issue originates.

brycegbrazen commented 3 years ago

Hi everyone,

I have found a fix for this issue that allows you to maintain 775 permissions (does not allow everyone to write to your folders.)

As far as our network configuration, we have a Windows Server running Active Directory, a Linux Centos file server that hosts our files to the network via Samba, and Windows clients that access the Samba share in their various DCC applications.

For us, the issue was that the permissions between our Windows AD->Linux->Windows were not cleanly being passed between the different OSes. USD uses the Win32 API under the hood to determine if a Windows User has permissions to write to any given directory. This API requires that the permissions across these different OSes are clearly communicated. All we have to change is settings on our Centos file server to map all of the permissions correctly.

CorrectPermissions IncorrectPermissions

You shouldn't have to restart your File Server for any of these fixes to work, but you may need to leave and join the domain, so expect for your file server to go down for a bit.

MAKE SURE TO REPLACE ANYTHING IN <> IN THE BELOW CODE WITH THE VALUES SPECIFIC TO YOUR NETWORK. ALSO IF NOT RUNNING CENTOS 7, YOUR COMMANDS MAY DIFFER SLIGHTLY.

STEPS TO FIX:

You will start by ssh-ing as root (if not connected to the domain) into the Linux machine that you are trying to host a Samba share on. If you have already connected the machine to the AD skip to step 5. If you already have Samba running on the Linux machine but are having issues with permissions, skip to step 9.

To connect to the network, we will use the preinstalled realm package. To check if you are already connected, use the command realm list. If you get anything returned here, you are connected to the domain already, otherwise you need to join the AD.

To join the AD, type realm join -U you will be prompted to enter your password. This should take a few seconds, and afterwards, a realm list should show you successfully connected to the AD.

You should at this point be able to ssh into the Linux machine as a domain user. Verify that you can do this before continuing.

Once connected to the AD, run a yum install samba samba-common. Install all dependencies as needed. This should includes packages such as sssd and samba and will setup the basic samba config directory (/etc/samba/*).

At this point, you will want to ensure everything installed correctly and nothing is corrupted by doing a:

systemctl start smb nmb winbind

(There should be a warning about the system not being able to find winbind yet, we will install that later).

If you got any errors here, check the logs for the respective service and debug. Otherwise, check the status of the services with:

systemctl status smb nmb

If everything is working, you should be able to see your user's Linux home directory from Windows by using Windows explorer to navigate to the hostname like so:

\\<hostname>

Now we will update the /etc/samba/smb.conf file to fix the permissions issues we are having. Open this file with your preferred text editor, I use vi because it's preinstalled and is simple to use, but you can use whatever. Update the file to look like this:

[global]

        workgroup = <DOMAIN NAME>
        security = ADS
        realm = <DNS name of the Kerberos Server>

        passdb backend = tdbsam
        kerberos method = secrets and keytab

        idmap config * : backend = tdb
        idmap config * : range = 3000-7999

        idmap config <DOMAIN NAME>:backend = rid
        idmap config <DOMAIN NAME>:range = 10000-999999

        template shell = /bin/sh
        template homedir = /home/%U

        winbind refresh tickets = yes
        vfs objects = acl_xattr
        map acl inherit = yes
        acl_xattr:ignore system acl = yes

        disable spoolss = yes
        printcap name = /dev/null
        load printers = no
        cups options = raw

# Here you will set the share name/comment/path and read only state. Don't set anything else here.
[testshare]

        comment = Test Share
        path = /test
        read only = False

Important things to know about the changes we made:

Set the kerberos method so auth is secure
Set the ID mapping to rid so that Winbind can translate the UIDs/GIDs back to Windows.
Set the template shell/homedir so that we retain individual user home dir and I believe template shell is required?
Set winbind to refresh tickets because otherwise they expire after a day or so
The three lines below winbind refresh tickets = yes are also ?required? for translating UIDs/GIDs back to Windows? Need to test this for sure
At the bottom section of global we disable printing.

At this point you should be able to restart samba and everything should still work. You should be able to access the share (folder under the testshare section) from Windows. You will again use Windows Explorer to test this via \\. If this doesn't work you have messed something up. I would start with checking that sssd, smb, nmb services are all running.

At this point, if you run the command id in Linux, you will likely see your ID is very large (See Figure 2). This means that the IDs are not correctly mapping from Windows->Linux. This is expected behavior. We have a few more steps to do.

Next we need to install winbind this is the service that handles mapping the Linux UID/GIDs back to Windows SIDs so that Windows apps using the Win32 API can correctly verify your user's permissions. To install this use the command:

yum install samba-winbind

In order to use a helpful winbind debugging utility called wbinfo (I won't go into how to use this for debugging), you should also run:

yum install samba4-winbind-clients

Now, winbind is ready to use as a service, but is not plugged into anything. Therefore if we started the winbind service, winbind would have no valid configuration and would actually block access to the share. So first we need to actually tell the Name Service Switch (NSS) to actually use winbind as a name resolver. To do this open up the /etc/nsswitch.conf file and edit these two lines:

passwd: files sss
group: files sss

to look like this:

passwd: files winbind sss
group: files winbind sss

After making this change, you do not need to restart/reload any services, as nsswitch is just an API for C libraries.

There is one other change that we must make in order to make the ID mapping work properly. This is currently unconfirmed if this actually affects anything, but AFAIK it is necessary. Open up the file /etc/sssd/sssd.conf and add the line:

ldap_idmap_autorid_compat = True

Supposedly this line makes it so SSSD and Winbind interact correctly and pass off IDs as intended.

After making this change, make sure to reload the sssd service via:

systemctl restart sssd

Lastly you should make sure smb, nmb, winbind, and sssd are all started up and running with no issues via:

systemctl restart smb nmb winbind
systemctl status smb nmb winbind sssd

After making all of the previous changes, you should be able to exit the Linux machine and ssh back into it (with your domain login) without any issues. If not you have messed something up.

Upon logging in, should be given the sh shell as defined in the above samba config and you should be able to run the id command and should get both User IDs and Group IDs that are in the 10000-999999 range (from the samba config). You should also see that the user and group names returned from the id command include the actual domain name in them Ex. (\). If all of these things looks right, you've successfully setup ID mapping for Samba!

Now that the mapping is working, you will need to use the new domain UIDs/GIDs for any existing folders on the share that you want to fix the permissions issues on. To do this you can run the chown command. For example, you can change the ownership of the root share folder (in this case /test) like this:

sudo chown <DOMAIN NAME>\\<DOMAIN USERNAME>:<DOMAIN NAME>\\<DOMAIN GROUP NAME> /test

You must use the above syntax \<DOMAIN USER/GROUP NAME> rather than the email @.com" when changing ownership or it may use the OLD UIDs/GIDs for the groups (the really high number ones) instead. The really high number ones will not work in Windows.

COMMON ISSUES

On the Linux side, the id's for groups are not mapping? For instance when ssh as a domain user into the Linux machine, I see this message:

/usr/bin/id: cannot find name for group ID 10513

To fix this, check your /etc/nsswitch.conf and ensure that you set only these lines to include winbind:

passwd: files winbind sss
group: files winbind sss

Please note that the shadow entry (that is in between passwd/group does not include winbind.

You may have issues when turning on winbind where when exiting ssh and rejoining, your ID is still not correctly falling within the range. To fix this, you should try leaving the realm, and rejoining it. This seemed to force rebind to recalculate ids correctly.

LostInClams commented 2 years ago

Hello Everyone.

I have some update around this issue, after debugging in /base/arch/fileSystem.cpp the error seem to arise from the use of AccessCheck call in ArchFileAccess function. AccessCheck call is enclosed with an if statement but a failure (return of 0) is not handled which results in that case being treated as if the file does not have write access which is not necessarily true. On the smb network drive where USD gives the 'Insufficient permissions to write to destination directory' error (the logged on user has read/write/execute) AccessCheck failed and after adding an else clause to the AccessCheck if and calling GetLastError() tells us that _ERROR_INVALID_SECURITYDESCR is the problem. In the case where _ERROR_INVALID_SECURITYDESCR (1338) is the error we need to treat it as a possible write and let ArchFileAccess return 0 (write is possible).

With my simple home Samba setup connected as the owner of a directory a chmod of 577 has AccessCheck return true with the accessStatus flag set as true (indicating write possible) and later failing when trying to write to the temp file. Here the problem seems to come from that for Samba mounts the AccessCheck is checking against the OtherUsers linux permission and not the Connected Samba User's permission. The same setup but with a chmod of 770 has AccessCheck accessStatus as false while a write would in fact be possible if tried.

This has lead to the conclusion that to be able to verify file writability without actually trying to write said file under Windows is not possible.

*Edited to fix typo

marktucker commented 2 years ago

In the USD library that ships with Houdini we eventaully started chipping away at these pre-write checks, and eventually just eliminated them all (when running on Windows). The only negative consequence (that I'm aware of) is that in the case where the destination file really can't be written to, you still end up writing the temp file to disk, but the final rename fails. This isn't great, but it's the only way we have found to avoid these false-failures that keep happening. I haven't done a PR for any of this because it doesn't feel like an actual "solution" to the original problem.

LostInClams commented 2 years ago

For our own use case we decided that just returning 0 in the ArchFileAccess (success) if we get past the special case for existence checking. Good to know that you haven't found any issue with doing this kind of thing either, gives us more confidence in the solution.

While I do feel the same that just removing the pre-write checks is not a "solution", given the current state where both false negatives and false positives are easily produced for network drives I don't think that keeping it is defensible either. We can of course just claim that its an issue with either Windows for lacking or buggy security APIs or maybe blame it on the providers of network drive solutions like Samba but that will not solve the problem for the users of USD.

spiffmon commented 2 years ago

Thanks for sharing your findings, @marktucker and @Ompalompen . I'm curious if anyone has tried out @brycegbrazen 's purely-configuration remedy?

santosd commented 2 years ago

I setup a very simple SMB on my Ubuntu machine 20.04 using the steps provided here: https://www.howtogeek.com/176471/how-to-share-files-between-windows-and-linux/

With that setup, I am able to reproduce the error when attempting to export USD from my Windows machine to the shared Linux folder. I tried @brycegbrazen 's configuration remedy, but I was not able to get it to work. However, I am not using Active Directory so for my domain I just did 'localhost' and removed the Kerbos Server line. My guess is that this configuration won't work with my simple setup. @brycegbrazen if you have any suggestions let me know.

Dariusz1989 commented 1 year ago

Is this ever going to be fixed or I have to get down and dirty and start compiling my own usd? I'm in contact with 7 studios that want to use USD none of them can due to this damn bug. Can Pixar people get on to it and disable these damn windows checks or at least let us ignore them with a pop up prompt/etc ?

dgovil commented 1 year ago

@spiffmon perhaps a good compromise could be disabling the checks based on an environment variable? Higher risk of failure but in general Samba shares seem to be more a matter of trial and error than pure science , since things vary so much by host/client OS and their respective versions.

That would at least give people an out if their systems aren't playing well.

tanant commented 1 year ago

That would be the way I'd imagine things going as well- from @marktucker 's earlier comment (https://github.com/PixarAnimationStudios/USD/issues/849#issuecomment-989098984) I wonder if the easiest route would be to grab that set of changes and gate them behind an env var + os default for win32 which has it on? (I hadn't done enough thinking/speccing out what the failure conditions look like though)

dgovil commented 1 year ago

@tanant personally I'd argue to have it on all platforms , and keep the default as the current behaviour.

My rationale would be:

It's not necessarily unheard of that other platforms have similar issues depending on the client/server setups
I think the current behaviour is still "the correct choice" on windows (in that it has the lowest risk of data loss), and I think it's better to place the weight of the decision of the riskier behaviour on the facility/TDs/individuals. I think if it's well documented it should be fine, and maybe if the checks in the current system fail, the TfError could mention the env var (I'm a big fan of Rust style error messages that give possible action items)

spiffmon commented 1 year ago

@meshula and I discussed this today, and that was one possibility we came up with (using a TfEnvSetting so that the behavior is also switchable at any point during runtime), but based on @bsilvaLS 's original observation that the Maya usdExport command does not exhibit the problem, we're considering (requires further discussion as well as experimentation) whether to replace our access checks with actually attempting to open the file for append -- which, alone, is non-destructive for an existing file. It would be ideal if we can reproduce the problem ourselves, first, which we're attempting to do.

spitzak commented 1 year ago

IMHO all such errors should be deferred until it actually tries to write the file. Pre-testing to make sure it works is a bad idea, as the answer can change for any number of reasons between when the test is run and the attempt is made. I cannot blame the Windows developers for assuming software would not try to do this.

One that is causing trouble right now for everybody is that resolving apparently is expected to test if the file exists and is readable, likely as a side-effect of Pixar's original resolver that searched multiple directories for a file. This is resulting in cryptic error messages and requires excessive complexity for resolvers. Resolving should be allowed to just output a file name, whether it exists or not should produce a correct and legible error (containing the file name!!!) when an attempt is actually made to open that file.

spiffmon commented 1 year ago

@spitzak , are you objecting primarily to the "readable" check? I do not know how to implement any searchpath-based resolver without performing existence checks...

For sure, better error messages would be good. As others have mentioned, in the case of writing files, we do the checks at "exported layer creation time" rather than "exported layer save time" so that one doesn't spend tens of seconds, minutes or more generating and adding-to-layer GB of data, only to be told afterwards that the file can't be saved. You make a good point that conditions can change in-between the two events, though! And that does add more motivation to making those checks be optional/configurable.

One thing that stands in the way is an assumption in various parts of our software (I'm not on top of it all, and some may be Presto-specific) that SdfLayer::CreateNew(), for at least some file formats, causes a file to come into existence. Assuming we need to preserve that behavior, that wouldn't prevent us from removing/avoiding Access checks, but it would mean that if you were relying on conditions to change favorably between CreateNew() and Save(), you will still be prevented from exporting.

spitzak commented 1 year ago

I think my complaint is that if the resolver does not actually check for the file's existence and produce an appropriate error, the error that is produced later when it actually tries to read the file is cryptic, repeated several times, and does not clearly indicate the filename or the fact that the file is missing or unreadable. This has been a while since I worked on this, I just remember being required to check in the resolver and produce an error there because otherwise users could not figure out what happened from the error messages.

spiffmon commented 1 year ago

@spitzak , understood and agreed on the need for comprehensible and useful error messages, and that a Resolver should not be forced to access an asset just to serve that purpose. I think that's a separable issue, and it would be great if someone could file one on gitHub with example of badness? In the problem reported in this issue it looks like we should be providing the name of the destination directory in the error... though one could argue we should always provide the filename within the directory even if we've determined it is the directory perms that are locking us out.

spitzak commented 1 year ago

What I remember is that you got several errors about unrecognized file format and none of them actually gave the name of the file. If you use the default resolver the messages are trivially better though it still has extra messages complaining about the resolver not working. usdview seems to be running an initial test before calling usd just to get a legible error for a non-existent file.

Ideally a missing file should produce "filename: strerror(errno)" as the entire text of the error.

The default resolver instead of returning blank should return the last name it searched for, so that the error can be reported when the attempt is made to read the file.

tasiotas commented 1 year ago

hi, any updates?

Windows users are suffering every time they want to overwrite their caches 😢

spiffmon commented 1 year ago

Hi all, this has languished in part because of our difficulty in testing, and in part because there's been so very much going on. But we understand this is a big deal, and we hope to very soon start looking into what we can do in the 23.08 timeframe.

kimuss commented 1 year ago

so... 23.08 released... no update for us? 😢

meshula commented 1 year ago

Hi, I regret to report that it wasn't possible to complete this in time for 23.08.

pmolodo commented 8 months ago

Should this be closed now that the TF_REQUIRE_FILESYSTEM_WRITE_PERMISSION env var above was added?

spiffmon commented 8 months ago

I believe it will close out when 24.03 gets cut...

PixarAnimationStudios / OpenUSD