ValveSoftware / Proton

Compatibility tool for Steam Play based on Wine and additional components
Other
24.48k stars 1.07k forks source link

Leverage deduplication for prefixes to reduce wasted storage space #4093

Open DanMan opened 4 years ago

DanMan commented 4 years ago

Feature Request

I confirm:

Suggestion

Instead of creating a prefix from scratch for each game, create a basic master prefix for each Proton version which you then only (sym-, hard-, ref-) link to for each game.

Justification [optional]

I've run rmlint -g on my currently ~30GB sized compatdata folder (68 prefixes) and it turned out that ~20GB of space are taken up by duplicate files. It would probably also speed up the first launch game setup.

Risks [optional]

Depending on which type of link you use to copy the files, certain risks present themselves. You might not want changes to the files in the master prefix to automatically propagate to all the linked game prefixes as it would with symlinks. So the most fitting link type would probably be reflinks, which only few file systems support (BTRFS). But you can just do cp --ref‐link=auto and it would already fall back on a plain copy, if the file system doesn't support reflinks.

NikoBellicRU commented 4 years ago

Also a lot of stuff that is not even needed in some games (some still do) are installed like directx/.net/vcrun that increase the size from like 100mb to 300-400mb or more depending on games aditional redist like gta v with rockstar launcher, most games i tried worked without them on a clean prefix and the ones that didn't work just needed the d3dcompiler.dll (dont remember wich one) and then worked fine.

Its not just to save space but also this would speed up the first startup/setup time

aeikum commented 4 years ago

We are working on this. Should be in an upcoming major release. Stay tuned.

Currently the implementation uses symlinks, so it should work on all filesystems which support symlinks.

msmafra commented 4 years ago

Wow! I was looking for it yesterday. I was going to create and issue for that. It would be awesome! It started doing recently, but not frequently, deduplicate my Games drive because I use BTRFS, but it seems that it don't always work correctly since games already installed and played fetch some packages again. I was about, in a near future, to test with symbolic links and some rsync.

DanMan commented 4 years ago

Nice. But the thing with symlinks is that they just point to another file's directory entry (not inode¹), which might itself be another symlink. So you add at least 1 more inode lookup whereas a reflink points to an inode already holding the data. Not great but probably ok, if you keep the symlink chain length =1.

So I understand that without symlinks it may be too specific to BTRFS but it'd be nice, if reflinks were used if the FS is indeed BTRFS. Just my 2 cents.

¹ which is why they break, if you move the original file

OvermindDL1 commented 4 years ago

@DanMan I've not used BTRFS as of yet, is a reflink just a hardlink everywhere else, or is it something different/unique? A hardlink would fix the issue on all other FS's if symlink lookup time is an issue, which I honestly do not think it is in any meaningful way, and symlinks are clearer.

aeikum commented 4 years ago

Whatever we use needs to work across filesystem boundaries (think multiple Steam libraries). That excludes hardlinks, and probably reflinks, although I'm not sure.

Patola commented 4 years ago

Please make the feature optional. As much as it uses a lot of space, it also allows for finely tuning of each game's proton instance in a kind of 'sandboxed' way. Many games have requirements that conflict with others.

Plagman commented 4 years ago

De-duplication is just that, de-duplication. It wouldn't kick in in the cases where prefixes are actually different.

hmlendea commented 4 years ago

Yeah... in many cases (where you have expensive SSDs) this waste of space makes Linux more expensive than Windows. There's also many games with a save import feature which currently cannot use it without manually copying the save file - which is another black dot for Linux and PC gaming.

Hopefully for Proton-whitelisted games with save import features this will be enabled by default so that "it just works" without having to tinker with it. I'd also think it would be good to have it on by default for all whitelisted games that can have it on. But for unsupported titles (non-whitelisted) it should be off - those ones should have their own prefix and only share it if the user wants it and finds it stable enough.

aeikum commented 4 years ago

Please make the feature optional. As much as it uses a lot of space, it also allows for finely tuning of each game's proton instance in a kind of 'sandboxed' way. Many games have requirements that conflict with others.

We're not putting all games into one prefix. We're symlinking for example each prefix's C:\windows\system32\user32.dll to a shared, read-only Proton dist/lib64/wine/user32.dll file. However, we do want to make sure we're not breaking stuff that users are already doing (e.g. protontricks).

hmlendea commented 4 years ago

Please make the feature optional. As much as it uses a lot of space, it also allows for finely tuning of each game's proton instance in a kind of 'sandboxed' way. Many games have requirements that conflict with others.

We're not putting all games into one prefix. We're symlinking for example each prefix's C:\windows\system32\user32.dll to a shared, read-only Proton dist/lib64/wine/user32.dll file. However, we do want to make sure we're not breaking stuff that users are already doing (e.g. protontricks).

And what about the "My Documents" folder and other places that store saved games? There is a fair amount of games out there that can import the saves from other games, which is currently broken with Proton, and requires the user to manually copy them from one prefix to another before converting them in-game.

aeikum commented 4 years ago

That's not related to what's being discussed in this issue. Can you name some games that are affected?

hmlendea commented 4 years ago

That's not related to what's being discussed in this issue. Can you name some games that are affected?

I am currently playing The Walking Dead from Telltale wich is a series of multiple games (seasons). Each season can "import" the save from the previous one so that the story will continue with the choices you made in the previous one(s). To do this, the game looks in e.g. My Documents/The Walking Dead Season 2 (I'm not exactly sure this is the correct name) folder, converts them and creates and saves a new save slot for the current game (in its dedicated folder). On a normal Windows machine this is going smoothly since there is only one My Documents directory, but with proton, each prefix has its own separate one with no way to "see" the one from other prefixes. I didn't play other Telltale games but seeing how they are all story games, if they have other multi-season games I expect them to work the same way.

Other examples would be many of Paradox Interactive's strategy games (but at least they have native Linux versions), where you can continue playing in the same "world" in another game (allowing you to play a very vast and long timeline covering most of history). And another one that I know of is Borderlands remaster which can import the save from the original, and Skyrim Special Edition which can also do the same thing.

Initially I was under the impression that two (or more) games would use the same prefix. If it's indeed not the case I can open a separate request for my problem.

aeikum commented 4 years ago

@hmlendea Thanks. It's definitely something that should be fixed. There are already a couple issues discussing this: #231, #604.

hmlendea commented 4 years ago

@hmlendea Thanks. It's definitely something that should be fixed. There are already a couple issues discussing this: #231, #604.

This sounds great! Thanks! Just a few days ago I went through a great sturggle of copying TWD Season 2 saves to a remote computer via SSH just so I can continue playing Season 3 with Remote Play. It's really great news if you will fix this!

MayeulC commented 4 years ago

Regarding this de-duplication issue, but also versioning of the shared libraries, you might be interested in looking at how Nix and Guix (See also: #Features) handle this. The gist of it is like you describe it, though a bit more extreme as you have a single "store" containing the libraries (in folders named after a hash of the input data: compilation parameters, source code, etc).

Each application can be executed in its own environment, and accesses the required libraries trough symlinks.

This comes with a few goodies, like atomic upgrades, system-level deduplication while allowing multiple users to to install software (thanks to the store being managed by a separate daemon).

Also worth exploring, though I have no doubt you are fully aware of its existence, is OSTree. It's basically the same thing, but more like a git repository, and is content-addressed, not build receipe-addressed. They have a nice page comparing multiple different projects. Flatpak famously uses it. You can also overlay multiple filesystem images to provide a vanilla base, while allowing the user to perform their modifications.

Casync gets a mention on that page as well, it's another content-addressable tool, with emphasis on using rolling hashes to allow intra-file deduplication (see the announcement by Lennart Pottering).

Overall, if you are already looking at this and it seems like a relatively big project, I would really like to see you embrace existing solutions, unless your use-case isn't already covered, of course.

It would also be worth exploring deduplication at the steam library level. Multi user computers are getting more rare nowadays, but this could still be invaluable, as well as sharing data among multiple titles. A lot of these tools also provide facilities for binary diffs and distribution, as well as content validation. I understand you already have your own architecture in place, though. A GuixSD or NixOS-based distribution would make a lot of sense for SteamOS as well, while allowing to manage every aspect of the Steam Library in the same store.

aeikum commented 4 years ago

One big catch with all these ideas is that we are simply a binary shipped to and run by a user account - we can't assume very much of the host system and can't ship setuid binaries or system config changes. We briefly considered using FUSE overlayfs[1], however this required a setuid binary, so nope.

Another big catch is end-user ease of use. Symlinks are pretty well understood and easy to use. If I want to modify something in the prefix, I can just delete the symlink and pop in my own file. With overlay filesystems, you usually have to run a program or daemon in order to operate on the files. Worse, you can unknowingly put it into an invalid state or destroy data just by doing normal file operations.

Finally, performance is always a concern. Some games have or open a lot of files, repeatedly. Apparently OSTree uses an HTTP server(?!), and casync mentions using a secure hashing function, which is slow. So just with a brief reading I'm not sure they'd be appropriate. The symlink approach we're using is less than 150 lines of Python added to the proton script.

Anyway it's definitely an interesting topic and there's possibly a better option than symlinks. I'd be happy to entertain a specific proposal, but you'd need to consider the above topics.

[1] https://www.winehq.org/pipermail/wine-devel/2020-March/163308.html

Saroumane commented 3 years ago

Hello, thanks to this thread I discovered rmlint. It tells me I could spare 34 GB on my SSD. Sounds tempting... As I understand that proper deduplication (handled by steam itself) is not on the horizon, is it safe to use rmlint script to save space ? What parameters could I use ?

pchome commented 3 years ago

@aeikum

We briefly considered using FUSE overlayfs[1], however this required a setuid binary, so nope.

Linux kernel 5.11 now support unprivileged mounting of overlayfs. I yet to check how this useful for wine/Proton, but it looks like it can solve the problem with setuid binary.

https://kernelnewbies.org/Linux_5.11#File_systems

  • OVERLAYFS
    • Unprivieged mounts commit
    • Introduce new uuid=off option for inodes index feature. It can be used to replace UUID of the underlying filesystem in file handles with null, and effectively disable UUID checks. This can be useful in case the underlying disk is copied and the UUID of this copy is changed. This is only applicable if all lower/upper/work directories are on the same filesystem, otherwise it will fallback to normal behaviour commit
    • Add the -o userxattr mount option forces overlayfs to use the "user.overlay." xattr namespace instead of "trusted.overlay.". This is useful for unprivileged mounting of overlayfs commit

So, $ mount -t overlay overlay -ouserxattr,... should work for regular user (I guess).

ovl-update-5.11 overlayfs.rst

pchome commented 3 years ago

Well, I'm not the expert, but currently I can get overlayfs mounted by the user only this way:

$ unshare --mount --map-root-user
$ mkdir -p ovl/{lower,lowest,work,upper,mnt}
$ mount -t overlay overlay -olowerdir=ovl/lower:ovl/lowest,upperdir=ovl/upper,workdir=ovl/work ovl/mnt
$ touch ovl/mnt/file
$ ls ovl/upper
file
$ whoami
root

Such mounts are not visible from different sessions.

different55 commented 2 years ago

Just wanted to bump this, even with lots of external storage space gets kinda tight on a 64GB Steam Deck with all these duplicate files hanging out on internal storage. Maybe as a stopgap compatdata can be stashed in the library folder the game itself is actually being installed to?

mercuriete commented 2 years ago

I think using a btrfs or another COW (copy on write) based filesystem will fix the problem. But the main problem using btrfs right now is the support of the "case insensitive" filesystem.

I think the only think we can do is wait until btrfs have "case insensitive" feature or wait until ext4 gain a COW feature and a deduplication script being run from time to time.

In the end, this is not affecting only to steamDeck but the lack of COW on ext4 is affecting me on desktop as well.

Another solution is the solution made by docker. they use overlay2 filesystem and a layer system that allows to reuse the same files sharing layers. Steam could make use of proton generic layer for sharing across the games and each game could overwrite files in their own layer.

I hope the very talented people working on valve could think of a solution that perfectly matches with the gaming use case but I think that problem is already solved on the "server-side" of things.

WhyNotHugo commented 2 years ago

@mercuriete Why is a case insensitive filesystem relevant here? ext4 is not case insensitive either.

different55 commented 2 years ago

@mercuriete Why is a case insensitive filesystem relevant here? ext4 is not case insensitive either.

ext4 supports optional casefolding, important for windows compatibility. WINE supports a workaround but it's faster if the filesystem does the heavy lifting.

andrewbaxter commented 5 months ago

FWIW regarding non-overlay dedup/reflink:

Saroumane commented 5 months ago

Meanwhile, manuel dedupe (and btrfs zstd 1 compression) gives spectacular results on compatdata/ :

$ sudo compsize compatdata/
Processed 786837 files, 589952 regular extents (2582241 refs), 183526 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       33%       16G          50G         282G  

It reads like this : 16G of data used on my SSD are corresponding to 50 G of uncompressed data, which are referencing (thanks to duperemove) 282G of data ⇒ deduplication saved me : 282 - 50 = 232 G ! (for 1369 prefixes)