dbr / tvnamer

Automatic TV episode file renamer, uses data from thetvdb.com via tvdb_api
https://pypi.python.org/pypi/tvnamer/
The Unlicense
919 stars 116 forks source link

Added hardlink and copy options #172

Open Zorvalt opened 4 years ago

Zorvalt commented 4 years ago

This PR adds the ability to leave original files untouched and copy or create a hard link to them. I think it answers issues partially #46, I think #40 and #10

I added the command line options:

I added the config options:

I am quite new to this tool (and to contributing) so I may have not seen some corner cases. The biggest change I had to do is renaming file after moving/copying them. This was to avoid making too many changes to the code base.

dbr commented 4 years ago

master branch should be good to rebase with if you have the time - I've also fixed up the integration tests which were failing, so the Travis status should be valid again \o/

Zorvalt commented 4 years ago

Thanks for this! Seems like a useful addition

You're welcome. Thank you to you, this tools is very useful and represent a fair amount of work!

  1. I'll merge #161 just now - would you mind rebasing your changes on to this?

    Hmm, there is another pull request pending merging, #161, which will conflict with this (the same_partition stuff is removed)

This is indeed a problem. I could do without the same_partition function by using a try/catch but I think the code would loose some clarity over the verbosity of calling same_partition.

  1. Would it be possible to merge the options into one "how to move the file" option? As in instead of separate always_copy and always_hardlink booleans (which are mutually exclusive), it would be neater if there was something like a "move mode" type enum option (which could be set to "always copy", or "hardlink or copy" etc) Would need a bit of thought as to what options are needed, particularly to cover what happens when a certain operation isn't possible (e.g hardlink to a different drive isn't possible). This would deprecate the always_move option - I think that can be handled quite easily - if always_move is True, set the move_mode option to the appropriate setting and output a deprecation warning.

I thought of doing that at the beginning but did not want to propose to big of a change. I also think it would be better but will also require some refactoring. I will look into it. I need to get more into the different config options before doing anything clean.

Just to be clear, if I understand correctly the code and your proposition, your are suggesting to use move_files_enable to indicate any type of move/copy/etc. and then have a complementary move_mode as you described. I would therefor remove all the always_* options but the always_rename.

Also, sometimes functions use Config[...], sometimes they take those as parameter and some other time a mix of both. Is there a guideline of some sort ?

Zorvalt commented 4 years ago

master branch should be good to rebase with if you have the time - I've also fixed up the integration tests which were failing, so the Travis status should be valid again \o/

Great! I'll rebase the code ASAP

dbr commented 4 years ago

Just to be clear, if I understand correctly the code and your proposition, your are suggesting to use move_files_enable to indicate any type of move/copy/etc. and then have a complementary movemode as you described. I would therefor remove all the always* options but the always_rename.

Yep I think this is correct:

Also, sometimes functions use Config[...], sometimes they take those as parameter and some other time a mix of both. Is there a guideline of some sort ?

A lot of that is legacy of the config being introduced later because lots of functions took too many arguments

When things should be an simple argument (e.g bool/str) versus using the config is slightly tricky, as with hindsight I don't think the config should have been a global object (ideally the config would be passed to the relevant "core" parts of the code, and then when lower level functions just use one or two options, it would be clearer when the config items can be neatly passed in as arguments)

I guess it should mostly boil down to making things easy to unit-test - it is easier to write a test for a function which takes a bool than uses the config. However with methods like EpisodeInfo.generateFilename which inherently use many config options in one place, it is easier to test by varying the config

dbr commented 4 years ago

So as a first-pass, I think the modes required are:

Zorvalt commented 4 years ago
  1. I'll merge #161 just now - would you mind rebasing your changes on to this?

Hmm, there is another pull request pending merging, #161, which will conflict with this (the same_partition stuff is removed)

This is indeed a problem. I could do without the same_partition function by using a try/catch but I think the code would loose some clarity over the verbosity of calling same_partition.

I started rebasing. Do I revert the same_partition function or not ? I understand the cleanup of the code and it was needed but the program lost the ability to move between partitions, too. Was this intended ?

Zorvalt commented 4 years ago
  1. I'll merge #161 just now - would you mind rebasing your changes on to this?

Hmm, there is another pull request pending merging, #161, which will conflict with this (the same_partition stuff is removed)

This is indeed a problem. I could do without the same_partition function by using a try/catch but I think the code would loose some clarity over the verbosity of calling same_partition.

I started rebasing. Do I revert the same_partition function or not ? I understand the cleanup of the code and it was needed but the program lost the ability to move between partitions, too. Was this intended ?

Ok, digging more into rename_file and then shutil.move(old, new), I understand better! I will do without same_partition :-)

Zorvalt commented 4 years ago

hardlink - like copy but using hardlinks at new location (reverting to copy if hardlinks not available)

I am not convinced by the idea that it should always fallback. In my use case, I would prefer it to not copy some huge files and fail or skip it with an error message.

I suggest to use two options hardlink and hardlink_or_copy. It adds little bit of complexity but if you want to use hardlink, it is probably to avoid copying.

Zorvalt commented 4 years ago

I implemented the new move_mode as a python enum with the following default configuration entry:

    # The method used to move files at configured destination
    #
    # * move (default)   - to move files
    # * copy             - to copy files
    # * hardlink         - to link files with a hard link
    # * hardlink_or_copy - to hardlink files when possible otherwise, falls back to COPY
    # * symlink          - to link files with a symbolic link
    # * leave_symlink    - to move files and replace the old ones with a symbolic link to the new location
    'move_mode': 'move',

Should I keep the always_move and leave_symlink options in the config file for retro-compatibility ? They would then simply set the right mode except that I would need to handle the case where the mode is also set at the same time. What should be the behavior then ? I suppose, an error since a user of the previous config won't see anything break. Maybe marking them deprecated ? If so, should a warning be shown at runtime ?

dbr commented 4 years ago

Good point - that set of options seem good to me!

For maintaining backwards compatibility, I think what you suggest is pretty much the way to do it. Specifically I would suggest:

Zorvalt commented 4 years ago

I pass almost all the tests but two because, as stated previously, I had to inverse to order in which we "rename then move" to "move then rename". This is necessary for the copy and linking type of moves as the idea would be to keep the original files unchanged. The tests expect that, in case of failure to move a file (e. g. file already exists at destination without overwrite flag), the original file should have been renamed but not moved.

As you discussed in #40, the behavior should change and I agree. I think that a small step in that direction is worth taking in this PR as it is the "logical thing to do" to implement copy and linking.

Now, to issue arise: What about backwards compatibility and how to do it ?

  1. For the compatibility, as shown by the two failing tests, it would break the current behavior but I think, the current behavior is not what user expects. If this is to be treated as a bug, it's fine. Otherwise, maybe it's time for a 3.0 ? But that seems to be a big move for a small change.
  2. For the "how to do it", I think that the Renamer would be a perfect fit for the job. It would be relatively easy the transform it in a kind of "builder" on which we can call move and/or rename then, finally, an execute method would actually try to do the operation. This would allow to, later, break the multiple operations in smaller cumulative ones if needed.

The failing tests are: test_forcefully_moving_disabled with result

Expected files: ['Scrubs - [01x01] - My First Day.avi', 'tv/Scrubs/season 1/Scrubs - [01x01] - My First Day.avi']
Got files:      ['tv/Scrubs/season 1/scrubs.s01e01.avi', 'tv/Scrubs/season 1/Scrubs - [01x01] - My First Day.avi']

test_forcefully_moving_default with result

Expected files: ['Scrubs - [01x01] - My First Day.avi', 'tv/Scrubs/season 1/Scrubs - [01x01] - My First Day.avi']
Got files:      ['tv/Scrubs/season 1/scrubs - [01x01].avi', 'tv/Scrubs/season 1/Scrubs - [01x01] - My First Day.avi']

Also, I do not understand why the two input files (scrubs.s01e01.avi and scrubs - [01x01].avi) were processed in one order in test_forcefully_moving_disabled and the inverse in test_forcefully_moving_default.

dbr commented 4 years ago

The tests expect that, in case of failure to move a file (e. g. file already exists at destination without overwrite flag), the original file should have been renamed but not moved

This isn't intend behaviour, just a side-effect of of retrofitting the move feature - I'm definitely happy if that can be fixed!

For the implementation, the builder approach sounds reasonable - sounds similar to the approach I started taking with an experimental version of tvnamer written in Rust, https://github.com/dbr/rstvnamer/blob/3f832e551a070cf31fda8e27d3c8268d9712232c/src/actions.rs#L28 I never got around to fleshing out the idea - allowing multiple chained actions (e.g "move here, then copy into other locaiton with the final name"), or adding more like "run shell command with new file as argument" - would need to be done carefully to avoid ballooning the scope of tvnamer too much, but could be quite useful

Not sure how familiar you are with Rust, but it is essentially how I intended (or intend) tvnamer's code to be structured - fairly well summarised in this method: https://github.com/dbr/rstvnamer/blob/3f832e551a070cf31fda8e27d3c8268d9712232c/src/main.rs#L13-L25 Basically "parse info out of the file", then "populate it with more data from TVDB", then "format the data into filenames or paths etc", before doing some action with all of this data

..allowing more complex action chaining is definitely a future thing - but if the Renamer class could be refactored to something vaguely in line with this, that'd be good