bdunogier / subber

Web application that subtitles downloaded episodes in the background
MIT License
2 stars 1 forks source link

Cannot find subtitles when several TV shows have the same name #23

Open lolautruche opened 9 years ago

lolautruche commented 9 years ago

Problem occurs with BetaSeries scrapper. When several TV shows share the same name (e.g. Doctor Who / Doctor Who 2005, Once upon a time / Once upon a time 2011...), filename may not be sufficient to identify TV show / episode.

Example

Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION. This filename is actually related to Once upon a time (2011) (TVDB Id: 248835). Using BetaSeries scrapper, it looks for Once upon a time (TVDB Id: 83882).

Thoughts

Problem here is release name inconsistency, as the year doesn't appear in the file name. Pragmatic solution would be to be able to use TVDB Id in subber:watchlist:add-item command (and associated REST implementation). Note that TVDB Id is present in Sickbeard script output:

Opening URL: http://localhost:5051/home/postprocess/processEpisode?nzbName=Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION.nzb&quiet=1&dir=%2Fmedia%2Ficy4to%2Fincoming%2FOnce.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION
Processing folder: /media/icy4to/incoming/Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION

Processing /media/icy4to/incoming/Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION/Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION.mkv (Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION.nzb)
Found result in history: (248835, 4, [], 4)
Parsed Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION into Once Upon a Time - S4E21 - 720p.HDTV.X264 (DIMENSION) [ABD: False]
Looking up Once Upon a Time in the DB
Lookup successful, using tvdb id 248835
Loading show object for tvdb_id 248835
Retrieving episode object for 4x21
Snatch history had a quality in it, using that: HD TV
Sick Beard snatched this episode, marking it safe to replace
This download is marked as safe to replace existing file
Found release name Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION
Destination folder for this episode: /media/TV/Once Upon a Time (2011)
Moving file from /media/icy4to/incoming/Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION/Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION.mkv to /media/TV/Once Upon a Time (2011)/Once.Upon.a.Time.2011.S04E21.Mother.mkv
Deleted folder: /media/icy4to/incoming/Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION
Processing succeeded for /media/icy4to/incoming/Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION/Once.Upon.a.Time.S04E21.720p.HDTV.X264-DIMENSION.mkv
bdunogier commented 9 years ago

I'm thinking that sickbeard is doing one smart thing here. Look at this:

Found result in history: (248835, 4, [], 4)

It has used, I think, the release/file name to find the matching episode in its history, since it started this download itself. I'm wondering what the output would be in the case of a download that was manually started... care to try ? :-)

I'm gonna setup automated testing of the wrapper, probably by faking the executable or something. It's pretty clear that there are different outputs from the post-processing script, and they're not tested.

That aside, what bothers me a bit here is that we'd have to change the way we store the queued item. We currently ONLY send the release's name. Unsufficient for that use-case. But the problem is that the Scrapper interface is expecting a release name. In the case of the Betaseries Scrapper, it uses it to identify the episode, and get the subtitles, all in one.

Let's imagine that we queue the show name, season number, episode number... in addition to the release name (we still need it for parsing & compatibility matching). We could use the thetvdbid (from SB), with GET /shows/episodes. It accepts a tvdbid + season + episode number, and can return subtitles in addition.

Maybe we should separate the Scrapper into two interfaces:

We could then add a text field to the tasks table, where we store episode data in JSON. We can then accept them in the queue request. We also have the option to try to fill them in, using the release scrapper, when an item is added to the watchlist.

What do you think ?

lolautruche commented 9 years ago

What do you think ?

I think it's a very good idea! We indeed only have advantages on splitting release and subtitles scrapper interfaces. And it opens the door to separate release scrappers.

+1 to store episode data, but what about having a different entity instead? This would avoid mixing the tasklist and the episode metadata.

lolautruche commented 9 years ago

FTR, output when Sickbeard doesn't have the episode in its history:

Processing /media/icy4to/incoming/Game Of Thrones S05E04.1080p HDTV x264-BATV/Game.of.Thrones.S05E04.1080p.HDTV.x264-BATV.mkv (Game Of Thrones S05E04.1080p HDTV x264-BATV.nzb)
Found result in history: (121361, 5, [], 16)
Parsed Game Of Thrones S05E04.1080p HDTV x264-BATV into Game Of Thrones - S5E4 - 1080p HDTV x264 (BATV) [ABD: False]
Looking up Game Of Thrones in the DB
Lookup successful, using tvdb id 121361
Loading show object for tvdb_id 121361
Retrieving episode object for 5x4
Snatch history had a quality in it, using that: 1080p HD TV
Sick Beard snatched this episode, marking it safe to replace
This download is marked as safe to replace existing file
Deleting file /media/TV/Game of Thrones/Game.of.Thrones.S05E04.The.Sons.of.the.Harpy.mkv
Deleting file /media/TV/Game of Thrones/Game.of.Thrones.S05E04.The.Sons.of.the.Harpy-thumb.jpg
Found release name Game Of Thrones S05E04.1080p HDTV x264-BATV
Destination folder for this episode: /media/TV/Game of Thrones
Moving file from /media/icy4to/incoming/Game Of Thrones S05E04.1080p HDTV x264-BATV/Game.of.Thrones.S05E04.1080p.HDTV.x264-BATV.mkv to /media/TV/Game of Thrones/Game.of.Thrones.S05E04.Sons.of.the.Harpy.mkv
Deleted folder: /media/icy4to/incoming/Game Of Thrones S05E04.1080p HDTV x264-BATV
Processing succeeded for /media/icy4to/incoming/Game Of Thrones S05E04.1080p HDTV x264-BATV/Game.of.Thrones.S05E04.1080p.HDTV.x264-BATV.mkv

What interests us is:

Loading show object for tvdb_id 121361
Retrieving episode object for 5x4