geigerzaehler / beets-alternatives

Beets plugin to manage external files
MIT License
91 stars 21 forks source link

Hard link untranscoded files when copied within filesystem #61

Open nichobi opened 3 years ago

nichobi commented 3 years ago

When copying files from the library to a new directory, within the same filesystem, the files should be hard linked rather than copied. This would take less time than copying and use less space, without any obvious downsides.

Use case: I keep a lossy version of my library next to my regular library, for easy copying to other devices. Any songs that are lossless are transcoded but the lossy ones are copied over as is. Currently all lossy files end up taking up space in both directories.

geigerzaehler commented 3 years ago

Seems link a good idea. os.link() is what we would need to use.

If you feel like, please open a PR, @nichobi. I’m happy to help with any questions.

nichobi commented 3 years ago

I had a look around and found a hardlink() function in beets util that simplifies implementation. What I'm unsure of is how to determine when to hardlink vs copy. Some options I've considered:

Do you have any opinion on which to go with? I'm happy to open a PR, just not sure in what manner to start working.

geigerzaehler commented 3 years ago

I wasn’t aware of beets.util.hardlink(). Makes sense to use it.

Thanks for the analysis on the different approaches. I think the last one makes the most sense. We could also add a config option that disables this behavior, for example if the user knows that the collection is on a different filesystem. But this is something we can always add later.

wisp3rwind commented 3 years ago

There's also reflinking of files (for some filesystems, such as btrfs). I think this is one more reason why no hard-/ref-linking should happen by default (because it's not clear what should be preferred), but rather only if a per-alternative config option is set. In addition, hardlinking by default changes the behaviour when the files in the alternative collection are modified (e.g. by a player writing rating tags): Currently, this will not affect files in the main beets library.

The cp command has a --reflink=[never|auto|always] flag, maybe the hardlink option could also take these values, with the same meaning.

Try to hardlink whenever a file is copied and fall back to copying on errors. This may be ineffecient when copying many files.

I doubt that this would really the bottleneck when updating alternatives (of course, I haven't measured it). I suppose that the system call to hardlink fails rather quickly if it is not supported by the filesystem.

geigerzaehler commented 3 years ago

In addition, hardlinking by default changes the behaviour when the files in the alternative collection are modified (e.g. by a player writing rating tags): Currently, this will not affect files in the main beets library.

Excellent point! This is indeed a good reason not to use hardlinking by default. In general it makes me wonder whether hardlinking is a good idea. Reflinking is a lot better but also less widely supported. (It’s only available on some file systems and not in the Python stdlib yet although there is a package for it.)

An alternative solution for your use case, @nichobi, would be to enable symlinks alongside transcoding. Basically we would add a flag that would symlink files instead of copying them if they don’t need to be transcoded. This is different from format: link where all files are symlinked by default. Would this be acceptable @nichobi?

nichobi commented 3 years ago

Symlinks could be a solution, but might be unstable if the main library is modified. Symlinks would break if a file is moved to a different path or deleted, requiring an alt update to repair. Hardlinks would still point to the same data, even if the main library is changed. For my use case, hardlinks would work better, but I can see why it might be troublesome. What seems best to me would be to make it an option, so the user could pick from copy, link/symlink, hardlink or reflink. Keeping the default value as copy seems the most sane, but gives the user the ability to pick whatever option works best for them.

wisp3rwind commented 3 years ago

I'd say, all of symlink, hardlink, reflink could be implemented (or one for now, adding others as requested), everyone could then choose his or her preferred method. For example, we could add the options

alt:
    phone:
        # ...
        link: [never|auto|always]
        linktype: [hardlink|symlink|reflink]

If I understand @nichobi correctly, symlinks might be somewhat inconvenient, because they'd require special care when copying to the other devices, e.g. cp --dereference or rsync --copy-links since a simple cp or rsync would copy the link.

Symlinks would break if a file is moved to a different path or deleted, requiring an alt update to repair.

On the other hand, the old, hardlinked/copied files might contain stale (meta)data. I don't think the validity of an alternative collection after changes to the beets database and before the next alt update is something we should care too much about.

geigerzaehler commented 3 years ago

I'd say, all of symlink, hardlink, reflink could be implemented (or one for now, adding others as requested), everyone could then choose his or her preferred method.

This seems to be the right approach given that every option has their own benefits and drawbacks and the user probably knows best what they want.

For configuring this I’d condense your approach @wisp3rwind: We would just provide one link option per alternative with values false, hardlink, symlink, and reflink. I don’t see why need an auto option.

wisp3rwind commented 3 years ago

For configuring this I’d condense your approach @wisp3rwind: We would just provide one link option per alternative with values false, hardlink, symlink, and reflink. I don’t see why need an auto option.

In that case, I think alt update should abort if a hardlink/symlink/reflink fails. Otherwise, with a silent fallback to copying, it's somewhat hard to verify that you've configured the alternative in a way that is supported on your filesystem.