dbr / tvnamer

Automatic TV episode file renamer, uses data from thetvdb.com via tvdb_api
https://pypi.python.org/pypi/tvnamer/
The Unlicense
912 stars 115 forks source link

Code reorganization - mainly management of file moving/copying/renaming #70

Open lahwaacz opened 11 years ago

lahwaacz commented 11 years ago

I began to work on the clean-up, so far I managed to maintain (more or less) the same functionality, almost all tests pass - except for test_forcefully_moving_default and test_forcefully_moving_disabled, which sometimes pass and sometimes fail, I think the order of input files isn't preserved - maybe some change to test helpers required? (Oh, the tests fail because now first the filename is constructed and then moved, originally the file was first renamed and then moved.)

One other test that always fails on my platform is test_unicode_in_inputname - tvnamer generates filename as 'The Big Bang Theory - [02x07] - The Panty Pi\xf1ata Polarization.avi'.

There's still a lot of work to be done, merge when ready... Until then, any comments appreciated.

dbr commented 11 years ago

Oh sorry, I lost track of this pull request!

Generally this looks great. I want to make a few more dramatic changes for tvnamer3, but this is a good first step

A few things:

lahwaacz commented 11 years ago

I've done the rebase, had to solve only three conflicts... I've slightly changed my previous code, see line 350 in utils.py. I also added --batch flag in test_series_id_with_nameless_series. Since #79 the test_name_generation_multiple_episodes_10 fails, someone should fix it.

lahwaacz commented 11 years ago

Removed the "I doubt anyone will ever fix this test" commit.

dbr commented 11 years ago

I've fixed the two failing tests in master, was just an episode title changed in thetvdb data.

BTW, did you do anything specific to make that unicode test stop failing in Python 2.5/2.6?

lahwaacz commented 11 years ago

Yes, I used unicodedata.normalize to compare the unicode strings. Just in case - you can see the changes here.

Btw, I split some tests into separate files, I think more tests from large files should be split into separate files based on which feature they test, it should be more transparent.

lahwaacz commented 11 years ago

Just wanted to ask - is it necessary to support python 2.5? It's pretty obsolete IMO, preferably I'd like to see more transition to python 3.x.

lahwaacz commented 11 years ago

I've got an idea to improve seriesname matching from database: results could be sorted by similarity ratio (for example difflib.SequenceMatcher.ratio() or difflib.SequenceMatcher.quick_ratio(), see this) instead of current autoselecting of first found item. As a bonus, lahwaacz/tvnamer@5eb6939 would be useless.

dbr commented 11 years ago

Yes, I used unicodedata.normalize to compare the unicode strings

Oh, I think this is a bad idea. It's likely to hide legitimate unicode problems...

I'm pretty sure the tests were only failing because of a Unicode bug in nosetests, which was only happened with Python 2.5 and 2.6 (e.g see the traceback here), it should have been fixed in nose in this pull request

Btw, I split some tests into separate files Yep, this is good change - I shoved far too many tests into the test_functional.py and such

Just wanted to ask - is it necessary to support python 2.5?

If it starts causing problems, I'm fine with dropping 2.5 support.

dbr commented 11 years ago

Oh, I've been using tvnamer with your changes for a while, and it's working nicely for the most part. Thanks hugely for all your effort!

I've pulled your current work into a ver3 branch, https://github.com/dbr/tvnamer/tree/ver3

A few things I noticed:

  1. The mkValidFilename must be run after any custom replacements and so on.

Currently it's run on chunks of the data, like epdata[key] = makeValidFilename(epdata[key]) (in EpisodeInfo.getepdata) - this is too early, and means the output_file_replacements is unable to do things like Series: The Subtitle with Series - The Subtitle (where : is an invalid character on OS X)

I just realised that the test is only valid on Darwin (and Windows, both of which have : in the default character blacklist),

Adding "custom_filename_character_blacklist": [":"] to that test's config will make the test proper on Linux too

  1. "Old path/New path" is always displayed, even when unnecessary (because old and new are the same)

Also the new path should be run through os.path.abspath or something, as it's a bit misleading:

$ pwd
/Users/dbr/code/tvnamer
$ touch /tmp/lost.s01e01.avi
$ tvnamer /tmp/lost.s01e01.avi
####################
# Processing file: lost.s01e01.avi
[...]
Enter choice (first number, return for default, 'all', ? for help):
1
Old path: /tmp
New path: .
Final filename: Lost - [01x01] - Pilot (1).avi
####################
Move file?
([y]/n/a/q) y
Renaming
Creating directory /tmp
Renaming /tmp/lost.s01e01.avi to /tmp/Lost - [01x01] - Pilot (1).avi

Note the "New path: ." which implies the file will be moved into the current working directory, which is incorrect. The code functions correctly, just the UI is wrong

  1. Should show the original and new filename next to each other.

Something like this:

Old filename: lost.s01e01.avi
New filename: Lost - [01x01] - Pilot (1).avi

Old directory: ~/downloads/
New directory: /media/tv/lost/

Move file? ([y]/n/a) 

I think it's better to show the filename separate from the path, so if the file is only being renamed then "old/new directory" is hidden (otherwise it implies maybe the file may be moved)

  1. Utterly trivial, but the "INFO - tvnamer started" and "INFO - tvnamer exited" log messages can be removed or dropped to debug-level
dbr commented 11 years ago

results could be sorted by similarity ratio (for example difflib.SequenceMatcher.ratio() or difflib.SequenceMatcher.quick_ratio(), see this) instead of current autoselecting of first found item

The API returns the series in what it thinks is the best order, so I'm not convinced tvnamer should try and re-sort these again

Also, there may be some subtitles that might be difficult to handle, like sorting the multiple language versions of the same show (where the similarity on all would be 100%)

lahwaacz commented 11 years ago
  1. I tried to run makeValidFilename only once on newFullPath, but it screws the / as separator in path. Running it only on newName is not an option as data for move_files_destination should also be "made valid". So I run it on chunks of data, which somehow works. It's obviously not ideal solution, better ideas welcome.
  2. Yeah, I didn't touch the UI since last month, so there are many improvements waiting...
  3. After some thought, I agree that re-sorting series names is good idea - to make it "work" I had to manually check the language etc., and even with these hacks I never made it to give preference to "Total Access 24/7" over "NFL Total Access".
lahwaacz commented 11 years ago

I've created new branch devel-argparse in my repo. I've changed the argument list parsing code to use argparse instead of optparse. Hopefully the code is cleaner and config loading simpler (see last commits), the problem is that argparse is in standard library only since python 2.7 (though it can be installed as separate package for older versions).

I'd appreciate any feedback, testing and so on, it surely is in it's early stages.