commercialhaskell / stack

The Haskell Tool Stack
http://haskellstack.org
BSD 3-Clause "New" or "Revised" License
3.98k stars 843 forks source link

Stack should not store any data in %AppData% #3623

Open nponeccop opened 6 years ago

nponeccop commented 6 years ago

On Windows, Stack uses an incorrect location to store data, which causes problems for people using roaming profiles feature of Windows (mostly at enterprises). Mostly it means that logons take longer as this folder is synchronized over LAN.

The problem persists through Haskell infrastructure because of deficiencies in directory library as it doesn't have separate local/roaming locations because roaming per-user config is irrelevant in non-windows AFAIK. See https://github.com/haskell/cabal/issues/4597 for a related issue with Cabal (not created by me)

The biggest consumers of my precious (it's normally limited by a disk quota by sysadmin bureacracy) roaming profile space are stack, local and cabal folders, all Haskell-related.

Although commonly they are fixed paths relative to user's home dir, and also stored in the registry, we shouldn't rely on this and use SHGetKnownFolderPath C call with FOLDERID_LocalAppData argument to get the right place. Note that it's slightly more complex if we want to support Server 2003 and Windows XP. Of course it's not a good idea to do that in stack but in directory.

Besides the roaming profile problems, there is inconsistency: indices, snapshots etc are stored in %appdata%/stack, and ghc/mingw in %appdatalocal%/programs/stack (which is FOLDERID_UserProgramFiles/stack)

IMO the best solution is to install all unix-style apps (including haskell compiled binaries) to FOLDERID_UserProgramFiles/local/bin. This way many apps can coexist (e.g. git and neovim can be safely put there ans use bin, share etc subfolders in local, so it becomes an analog of the ~/.local hierarchy)

The least invasive solution is probably just to use FOLDERID_LocalAppData/cabal and FOLDERID_LocalAppData/local instead of FOLDERID_AppData/* as it's done now.

There are guides for windows analogous to man hier, so I can provide with references if needed. But the topic of where put unix-style per-user software is very controversial, so I don't expect a quick and easy solution.

mgsloan commented 6 years ago

Hmm, as I am not a windows user, not sure I follow the suggested new default. May be worth changing the default, but preferring the current paths if they exist, so that people's existing installs still work. Perhaps open a PR and we can ask other windows users if the change seems good? Dunno if there will be consensus if this is a controversial topic.

Good news is that this is all configurable:

nponeccop commented 6 years ago

Things break in weird ways when MAX_PATH is encountered.

It's typical for ported software. There are newer file APIs (20+ years old by this time) with path length limit of 30k chars:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx#maxpath

The Windows API has many functions that also have Unicode versions to permit an extended-length path for a maximum total path length of 32,767 characters.

So it's fixable, but we "just" need to find out where in the codebase we use outdated file APIs.

borsboom commented 6 years ago

The MAX_PATH issue, in particular, needs to be fixed in GHC's base libraries (and there's already a GHC issue open about it). While Stack could, in theory, make its own file I/O library that uses the newer Windows API, we have Cabal as a dependency which does its own I/O and that can only use base libraries. Not to mention that GHC itself does file I/O and would also need to be fixed (and the best way to do that is also to fix the base libraries)

One could argue that the proper location for app data should also be fixed in System.Directory.getAppUserDataDirectory (which is what Stack uses), although that case would at least be practical fix it in Stack itself. You'd need to make sure it uses the old location if it already exists, though, to avoid breaking existing users' working setups.

nponeccop commented 6 years ago

@borsboom Yep that's why I put "just" in quotes. If for example ld uses the older API (or the newer API is incomplete so certain things are still limited by MAX_PATH) it's going to be a hard issue.

@mgsloan The idea of not putting big things in the roaming profile folder is pretty uncontroversial. Long story short, there are locations that cause problems, but there is more than one unproblematic location.

The controversy is that nobody knows where exactly should we put it if not to the roaming profile. E.g. appdata is considered to be a folder for internal applications's settings and data. A user should never open it ideally, just like nobody opens files in /var/db and /var/cache in vim, unless for diagnostic and recovery. Microsoft itself considers projects and build products not application data but documents, and puts them to %userprofile%/Documents by default. And then there is common practice of putting things to the root (but root is not writeable in locked down environments, and writing there makes your sysadmin mad even if possible), and so on. And we can consider snapshots internal data of stack etc

YellowOnion commented 4 years ago

I believe the correct folder for user agnostic data is meant to be ProgramData

The only problem is that ghc is an exe and it opens up a huge amount of cross user attack vectors if stack & admin etc don't fully validate the files are correct.

nponeccop commented 4 years ago

ProgramData is something similar to /var/db/stack. Stack on Linux doesn't put anything to /var/db/ so Stack on Windows should behave consistently. ProgramData is really for user agnostic data, but Stack never manipulates user agnostic data. Maybe it should, but it's another issue.

On Linux stack only creates current user's data. It doesn't touch global data at all. So ProgramData isn't the correct folder for Stack to use.

(Also, just like /var/db, ProgramData requires privileges to write; so isn't an option for many people installing Stack on locked down machines where they don't have root/admin rights.)