Diaoul / subliminal

Subtitles, faster than your thoughts
http://subliminal.readthedocs.org
MIT License
2.4k stars 311 forks source link

LTV: filename and subtitle are "file match" but no release group in filename and subtitle #652

Open fernandog opened 8 years ago

fernandog commented 8 years ago

As there is no 'release group' in both filename and subtitle it doesn't match release group and score get two low

Maybe in LTV when subtitle name == filename we make return a score 215 (hash) ? or another solution?

Release: Castle.S08E18.Backstabber.1080p.WEB-DL.DD5.1.H.264.mkv (no release group)

Downloading subtitles  [####################################]  100%  Castle.S08E18.Backstabber.1080p.WEB-DL.DD5.1.H.264.mkv
INFO:subliminal.core:Listing subtitles with provider 'legendastv' and languages set([<Language [pt-BR]>])
DEBUG:subliminal.providers.legendastv:Found subtitle <LegendasTVSubtitle u'5719b8d41ad54-castle.s08e18.backstabber.1080p.web-dl.dd5.1.h.264.srt' [pt-BR]>
INFO:subliminal.score:Computing score of <LegendasTVSubtitle u'5719b8d41ad54-castle.s08e18.backstabber.1080p.web-dl.dd5.1.h.264.srt' [pt-BR]> for video <Episode [u'Castle', 2003, 8x18]> with {'hearing_impaired': False}
INFO:subliminal.score:Computed score 204 with final matches set(['episode', 'format', 'series', 'year', 'season', 'video_codec', 'resolution'])

@ratoaq2

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Diaoul commented 8 years ago

Too low for what? If the video doesn't carry the release group, it'll be the highest non-hash match for that video, no need to fake the hash match. If you're not happy with the score computation you can implement your own, it's as simple as overriding a function for library users.

fernandog commented 8 years ago

too low for "perfect matches". the min_score we use is :

    if sickbeard.SUBTITLES_PERFECT_MATCH:
        return episode_scores['hash'] - (episode_scores['resolution'] +
                                         episode_scores['video_codec'] +
                                         episode_scores['audio_codec'])

its just a suggestion. maybe can be improved in other way. its not faking. the subtitle name has the exact same name as filename

Diaoul commented 8 years ago

What you can do is scale down the hash to the maximum score possible (sum of scores from filename tokens). However, in case of really poor filename, hash will be better but with this it won't.

ratoaq2 commented 8 years ago

I was taking a look at this question once more... and I realized one thing: If you decide to build a custom score computation then you don't have direct access to guess dict which is used in the actual logic:

https://github.com/Diaoul/subliminal/blob/master/subliminal/subtitle.py#L184

def guess_matches(video, guess, partial=False):

You only have access to the subtitle and video: https://github.com/Diaoul/subliminal/blob/master/subliminal/score.py#L66

def compute_score(subtitle, video, hearing_impaired=None):

Of course we can wrap the subtitle and intercept the guess_matches function and then implement any logic with it, but it's an extra layer and an extra step.

Diaoul commented 8 years ago

guess_matches is a utility function, you can chose to reuse it or not with your logic. What compute_score does is extract a guess from the video.name and call various utility functions on that guess dict. You can choose to use guessit for that or build a guess dict yourself or don't use a guess at all. You're free to do whatever you want with the given subtitle for the given video.

ratoaq2 commented 8 years ago

The video file is not an issue, but the subtitle. Subtitles can be of several types: LegendasTvSubtitle, OpenSubtitles, AddictedSubtitles, etc. Each one of them has different fields and some of them uses a filename and/or release name to create a guessit dict.

If I want to keep everything the same and only customise the release group scoring computation, then I'll need to create a new scoring function and handle every single subtitle type again. That's possible, as you said. I'm just wondering if it could be simplified and have more granular extension points for the score computation.

ratoaq2 commented 8 years ago

Just to have some concrete examples:

For the given video file without a release_group: Castle.S08E22.Crossfire.1080p.WEB-DL.DD5.1.H.264.mkv This is the sorted (score ASC) list of subtitles from legendastv

score = 344 for castle.s08e22.crossfire.1080p.web.dl.dd5.1.h-ddltv.srt
score = 344 for castle.s08e22.1080p.web-dl.dd5.1.h264-btn.srt
score = 344 for castle.s08e22.crossfire.1080p.web-dl.dd5.1.h.264.srt
score = 344 for castle.s08e22.crossfire.1080p.web-dl.dd5.1.h.264-hkd.srt
score = 344 for castle.s08e22.crossfire.1080p.web-dl.dd5.1.hevc.x265-rmteam.srt
score = 342 for castle.s08e22.720p.web-dl.dd5.1.h264-btn.srt
score = 342 for castle.s08e22.crossfire.720p.web-dl.dd5.1.h.264.srt
score = 339 for castle.s08e22.web-dl.x264-rarbg.srt
score = 339 for castle.s08e22.crossfire.480p.web-dl.x264-rmteam.srt
score = 339 for castle.s08e22.crossfire.720p.web-dl.hevc.x265-rmteam.srt
score = 332 for castle.2009.s08e22.hdtv.x264-lol.srt
score = 332 for castle.2009.s08e22.hdtv.xvid-afg.srt
score = 332 for castle.2009.s08e22.720p.hdtv.x264-dimension.srt

Since there's no release_group in the video file, there's no release group match and 5 subtitles scored the same: 344. The chosen subtitle is castle.s08e22.crossfire.1080p.web.dl.dd5.1.h-ddltv.srt which has release group: ddltv and has the same score of 4 other subtitles.

The best subtitle match would be the 3rd in this list: castle.s08e22.crossfire.1080p.web-dl.dd5.1.h.264.srt, which has no release_group.

Diaoul commented 8 years ago

You're right, you cannot tweak just that part which is a get_matches from the legendastv provider. https://github.com/Diaoul/subliminal/blob/master/subliminal/subtitle.py#L147

You can still use get_matches and overwrite the specific part about release_group for legendastv subtitles in your compute_score.

I don't see how to expose that in any other clean way.

ratoaq2 commented 8 years ago

If property matches could be refactored to accept a dictionary of functions, where the key is the property name and the value is a function that accepts two values and return True/False, then subliminal could have a default function list to compute matches for each property, and through API it could allow a different dict to be used:

Something like this: https://github.com/Diaoul/subliminal/blob/master/subliminal/subtitle.py#L199-L244

if match_functions['series'](video.series, guess.get('title')):
  matches.add('series')
if match_functions['title'](video.title, guess.get('episode_title')):
  matches.add('title')
if match_functions['season'](video.season, guess.get('season')):
  matches.add('season')
if match_functions['episode'](video.episode, guess.get('episode')):
  matches.add('episode')
if match_functions['year'](video.year, guess.get('year')):
  matches.add('year')
if match_functions['release_group'](video.release_group, guess.get('release_group')):
  matches.add('release_group')

That way it would be possible to tweak just part of the matching.

Another way to handle https://github.com/Diaoul/subliminal/issues/652#issuecomment-233577530 is to ensure that the subtitles are always sorted by:

For the case where there's no release_group and several subtitles score the same, the one with no release group will be first in the list:

score = 344 for castle.s08e22.crossfire.1080p.web-dl.dd5.1.h.264.srt (no release group)
score = 344 for castle.s08e22.1080p.web-dl.dd5.1.h264-btn.srt (btn)
score = 344 for castle.s08e22.crossfire.1080p.web.dl.dd5.1.h-ddltv.srt (ddltv)
score = 344 for castle.s08e22.crossfire.1080p.web-dl.dd5.1.h.264-hkd.srt (hkd)
score = 344 for castle.s08e22.crossfire.1080p.web-dl.dd5.1.hevc.x265-rmteam.srt  (rmteam)

So, if we scale down the max score because of the absence of a property, we can ensure that the first subtitle in the list is most likely the one we're searching for.

ratoaq2 commented 5 years ago

This can be fixed with a release match: #572