Open nichobi opened 3 years ago
Seems link a good idea. os.link()
is what we would need to use.
If you feel like, please open a PR, @nichobi. I’m happy to help with any questions.
I had a look around and found a hardlink() function in beets util that simplifies implementation. What I'm unsure of is how to determine when to hardlink vs copy. Some options I've considered:
Do you have any opinion on which to go with? I'm happy to open a PR, just not sure in what manner to start working.
I wasn’t aware of beets.util.hardlink()
. Makes sense to use it.
Thanks for the analysis on the different approaches. I think the last one makes the most sense. We could also add a config option that disables this behavior, for example if the user knows that the collection is on a different filesystem. But this is something we can always add later.
There's also reflinking of files (for some filesystems, such as btrfs). I think this is one more reason why no hard-/ref-linking should happen by default (because it's not clear what should be preferred), but rather only if a per-alternative config option is set. In addition, hardlinking by default changes the behaviour when the files in the alternative collection are modified (e.g. by a player writing rating tags): Currently, this will not affect files in the main beets library.
The cp
command has a --reflink=[never|auto|always]
flag, maybe the hardlink option could also take these values, with the same meaning.
Try to hardlink whenever a file is copied and fall back to copying on errors. This may be ineffecient when copying many files.
I doubt that this would really the bottleneck when updating alternatives (of course, I haven't measured it). I suppose that the system call to hardlink fails rather quickly if it is not supported by the filesystem.
In addition, hardlinking by default changes the behaviour when the files in the alternative collection are modified (e.g. by a player writing rating tags): Currently, this will not affect files in the main beets library.
Excellent point! This is indeed a good reason not to use hardlinking by default. In general it makes me wonder whether hardlinking is a good idea. Reflinking is a lot better but also less widely supported. (It’s only available on some file systems and not in the Python stdlib yet although there is a package for it.)
An alternative solution for your use case, @nichobi, would be to enable symlinks alongside transcoding. Basically we would add a flag that would symlink files instead of copying them if they don’t need to be transcoded. This is different from format: link
where all files are symlinked by default. Would this be acceptable @nichobi?
Symlinks could be a solution, but might be unstable if the main library is modified. Symlinks would break if a file is moved to a different path or deleted, requiring an alt update
to repair. Hardlinks would still point to the same data, even if the main library is changed. For my use case, hardlinks would work better, but I can see why it might be troublesome.
What seems best to me would be to make it an option, so the user could pick from copy
, link
/symlink
, hardlink
or reflink
. Keeping the default value as copy
seems the most sane, but gives the user the ability to pick whatever option works best for them.
I'd say, all of symlink
, hardlink
, reflink
could be implemented (or one for now, adding others as requested), everyone could then choose his or her preferred method. For example, we could add the options
alt:
phone:
# ...
link: [never|auto|always]
linktype: [hardlink|symlink|reflink]
If I understand @nichobi correctly, symlinks might be somewhat inconvenient, because they'd require special care when copying to the other devices, e.g. cp --dereference
or rsync --copy-links
since a simple cp
or rsync
would copy the link.
Symlinks would break if a file is moved to a different path or deleted, requiring an alt update to repair.
On the other hand, the old, hardlinked/copied files might contain stale (meta)data. I don't think the validity of an alternative collection after changes to the beets database and before the next alt update
is something we should care too much about.
I'd say, all of
symlink
,hardlink
,reflink
could be implemented (or one for now, adding others as requested), everyone could then choose his or her preferred method.
This seems to be the right approach given that every option has their own benefits and drawbacks and the user probably knows best what they want.
For configuring this I’d condense your approach @wisp3rwind: We would just provide one link
option per alternative with values false
, hardlink
, symlink
, and reflink
. I don’t see why need an auto
option.
For configuring this I’d condense your approach @wisp3rwind: We would just provide one
link
option per alternative with valuesfalse
,hardlink
,symlink
, andreflink
. I don’t see why need anauto
option.
In that case, I think alt update
should abort if a hardlink/symlink/reflink fails. Otherwise, with a silent fallback to copying, it's somewhat hard to verify that you've configured the alternative in a way that is supported on your filesystem.
When copying files from the library to a new directory, within the same filesystem, the files should be hard linked rather than copied. This would take less time than copying and use less space, without any obvious downsides.
Use case: I keep a lossy version of my library next to my regular library, for easy copying to other devices. Any songs that are lossless are transcoded but the lossy ones are copied over as is. Currently all lossy files end up taking up space in both directories.