Open IanMayo opened 4 years ago
What exactly do you mean by a dedicated username/account. Do you mean the archived import files will be put onto a network drive which is only accessible with a particular username and password?
If so, we might be able to access it by just setting the environment variable for the archive location to a path containing the network details plus the username and password:
\\username:password@SERVER\share\folder\path\to\archive
I'm not sure how much Python supports other user-based operations - eg. just writing a file as another user - but if we could manage to work that out then we could still set the username and password as an environment variable (potentially encrypted or hashed in some way - although obviously the code to unhash it would be available in the pepys repository).
Yes. They have created a new user account. While all (most) users have read access to the shared drive, only the pepysadmin
(or something like that) account has write access.
The user-details-in-URL is a clever solution, but if that URL is shown to users then the password will be compromised. Aah, if it's only used in a Python copy files
command, then maybe it won't be visible.
Their IT team will create an environmental variable that points to a network file location. This file will contain the username/password. Initially it would be plain text, but eventually they should be encrypted.
LET'S NOT DISCUSS THIS ENCRYPTION STRATEGY FURTHER IN THIS ISSUE 🤣🤣🤣
I did some googling on Python copy-as-other-user commands, but couldn't find any. But, I did find a chown
. So, maybe Pepys could chown
these files to the admin user, then copy them to the target drive. Maybe that would work.
Is this actually completed? We store the username and password, but as far as I can tell we don't actually use that username and password to do anything. Am I correct?
No, it's not completed. As you said - we're reading in the data, but we're not using it yet.
Right, I've done some investigation into various approaches to doing this, and the issues/limitations with them. The key problem we're trying to solve is how to copy files as a different user in Python.
Note: I wrote this while investigating these methods - I've left the explanations here for reference, but I've only found one way that works - which is the last one I tried - so I suggest we use that. Skip to the Use a pure Python smb client heading to see the information on that method.
I have set up a test virtual machine called WINVM
with a share called SecretFolder
. There are two users pepystest
has read-only access to the share (like a normal user would in the client's setup) and pepysadmin
has read-write access to the share. We want to connect as pepysadmin
while running the Python script when we're logged in as pepystest
.
My first approach to this was to try using the standard shutil.copy
function, and copying to \\pepysadmin:pepysadmin@WINVM\SecretFolder
. I assumed this would work, as specifying the user/password in the URL for the share is valid on Linux/OS X (when using Windows SMB shares) - but it isn't valid on Windows.
Next option was to use the net use
command from the terminal to map a share to a drive letter using the pepysadmin user/password, and then copy to that drive letter. This works fine, as long as there isn't already another drive letter mapped to this share. If there is already a drive mapped to that share, then you get the error:
System error 1219 has occurred.
Multiple connections to a server or shared resource by the same user, using more than one user name, are not allowed. Disconnect all previous connections to the server or shared resource and try again.
From what you've said, it sounds like the clients will already have a drive mapped to that share - so that approach won't work.
A potential workaround for that is that it is possible to connect to a network share a second time, if you use its IP address rather than its name. For example, you can have two connections with two different usernames if you connect to \\WINVM\SecretFolder
and \\192.168.0.15\SecretFolder
. Therefore, it might be possible to ask the user to configure the archive path using the \\SERVERNAME\Share\path\to\folder
approach (rather than using the mapped drive letter) and then extract the SERVERNAME
, look up its IP address, and then temporarily map that to a drive letter while we do the copy, and unmap it afterwards. Not a clean way of doing it, but it might work.
There is a Windows API that allows you to 'impersonate' another user. That API is available through the pywin32 Python package and an example of its use is available here.
Unfortunately, lots of things don't work through impersonation (see this blog post. From testing, it seems that mapped network drives aren't available when impersonating - so we can't just use the mapped share the user already has. Even more unfortunately, if we try to map a new network drive when we're impersonating a new user then we get the same error as above - impersonating another user doesn't seem to remove the restriction on only allowing a particular share to be connected to with one username as a time.
Smb is the protocol used to connect to Windows shared folders. There are two pure Python implementations of this protocol: pysmb and smbprotocol. I've tested both, and both seem to work connecting to a share under a different username to the username that you're logged in under.
Both packages have a fairly similar sort of API - although pysmb provides a slightly easier method for copying a file (it will 'store a file' from a file_object, which can be used to easily copy a file), but smbprotocol provides an API that more closely matches Python's API for dealing with files in the os
module (smbprotocol also supports later protocol versions which may be faster).
This is the only method I've come up with so far that actually works...! So I suggest we use it. It would require the user to specify the archive folder location in the config file as a UNC path \\SERVER\share\path\to\folder
rather than a path using a mapped drive letter (S:\path\to\folder
) - but I imagine that won't be too difficult (IT support may need to tell them what server and share it is - but I can't imagine that's secret information - as anyone could right-click on the mapped network drive and find out).
However, this may require a change to the way we deal with writing things to the archive directory. At the moment we just use standard python commands to create directories, write files etc. If we want to support both normal local folders (including on OS X and Linux) and Windows shares, then we'll need to replace these with wrapper functions that look at the archive path and work out whether to treat it as a Windows share (if it starts with \\
) or a normal folder (if it doesn't), and then use either the standard Python functions, or the functions from one of the smb libraries. This will make the code quite a bit messier - but hopefully by using wrapper functions sensibly we'll be able to make it not too messy.
Alternatively, we could write as usual to a local folder and then copy everything across to the share if necessary - but that would mean having extra copies on the local machine, which probably wouldn't be great.
Just one other thing to check: is the entire archive folder meant to be stored on this share, accessed with this special username? That will be all of the output logs, all of the highlighted files, plus all of the original files. I'm concerned that this will slow things down significantly - as I'm sure the pure Python implementations of the smb protocol are slower than the standard Windows implementations, and we'll be writing everything across the network.
(Wow, that was a long comment! Any thoughts welcome @IanMayo)
Thanks for that @robintw - I did imagine we could do it using the python chown
command, but I've since learned that's only possible on unix, and the user probably has to be logged in as root.
Aah, @robintw - in our Slack discussion I explained that the archive destination would be mapped to a drive with a letter.
I do believe it's also possible to access their high volume data archive using a \\share
URL. Your plan to access the drive via username/pwd in the URL seems fairly achievable. We're only aiming to provide a light level of protection, so that may be sufficient.
Hi @IanMayo - yeah I don't think the chown option wouldn't work even if we had access to the chown command, as if you're not connected with write permissions then you can't put a file there, even if the file is owned by admin.
Yes, my explanation of options above was based on them using a drive with a letter - but it seems to be completely impossible to connect to a drive with a letter with a different username to the one it is already connected using - hence me looking at options involving a UNC path (\\server\share
).
When you say "Your plan to access the drive via username/pwd in the URL seems fairly achievable." - do you mean my original plan just to connect with \\username:password@server\share
? The first section in my big comment above shows that isn't possible on Windows, as Windows doesn't recognise a username/password in the URL, and won't let you connect anyway if that share is already mapped to a network drive with a different username.
The only option I can see that is possible is the one described in the Use a pure Python smb client section above. Any Windows share that is mapped to a network drive should also be able to be accessed with a UNC path (unless their shared drives aren't using Windows networked folder sharing - in which case, none of this will work).
After receiving approval from Ian to go ahead with the pure-Python SMB client approach, I've investigated the two pure-Python SMB libraries that are available: pysmb and smbprotocol.
I thought there might be speed differences between the libraries, so I have done some benchmarking to compare the speed of these two libraries against a standard Windows mapped network drive. The details are on the wiki. From both the benchmarking results, and my experience writing code with these libraries, I came to the following conclusions:
pysmb
seems to be more unstable than smbprotocol
. A number of times my tests failed saying 'Connection Error', for no apparent reason, then worked again 30 seconds later. This never happened with smbprotocol
.smbprotocol
has a far nicer API, which mirrors many of the Python standard library like os
and shutil
. This means that copying a file is a single function call, as is finding if one exists - and these operations are much harder with pysmb
.pysmb
is normally a bit quicker than smbprotocol
(sometimes by up to 20%)Therefore, I'm going to go ahead and implement our code with smbprotocol
.
Yes - that's all fine, thanks @robintw
(tried to post this last night - it wouldn't go through)
[CF could be of value as importance of access control increases]
Just FYI, Ian, as far as I remember this is mostly implemented already in #340.
Since the archive folder contains the master copy of the import files, the users would like it to be protected from modification/delete.
They have created a dedicated username/account.
But we need to give Pepys access to a config file containing that username/password.
But, beyond that, the IT team would rather they weren't stored as plain text.
I welcome suggestions for how we implement this. We could have an encryption key in our app, but it's Open Source.
Hmm, maybe some obfuscation.