chadopp / mythepisode

TV Series/Shows
2 stars 3 forks source link

Episode Scheduled but Unmatched with obvious spelling differences #35

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
One Upcoming recording is Titled "Holy Cr**" but the Episode is titled "Holy 
Crap"
Another upcoming recording is titled "Model Misbehaviour" but the episode is 
titled "Model Misbehavior"

What is the expected output? What do you see instead?
Both upcoming recording's to be matched. They were placed as Upcoming but 
unmatched.

What version of the product are you using? On what operating system?
mythtv+web 0.23-fixes, mythepisode trunk

Please provide any additional information below.
Not sure how we can overcome the censoring of words - can we treat random 
characters (*!#) as wildcards?

As for english/american differences in spelling - First though is just to 
incorporate a check for "o" or "ou" in the match? Again, not sure how this 
would work?

Original issue reported on code.google.com by alec.chr...@gmail.com on 28 Oct 2010 at 7:56

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Having just looked at it again, the english/american spelling one ("Model 
Misbehavio(u)r") is matched both in the episodes list AND as an "Unmatched but 
scheduled" one at the bottom...?

I know there is the % match feature but, no offence, its not my preference. We 
must be able to solve mis-matches like this logically rather than on a %-match 
scale? I will take a good look later.

Original comment by alec.chr...@gmail.com on 28 Oct 2010 at 8:02

GoogleCodeExporter commented 9 years ago
Instead of using match %, what about soundex?  It would solve the problem of 
differences in spelling (i.e. altar and alter, or vs our spellings).  In a 
previous issue you mentioned something about girlfriend... with soundex 
"girlfriend" and "girlfriends" are equal, but "the girlfriend" isn't.  

Original comment by chris.k...@gmail.com on 28 Oct 2010 at 12:05

GoogleCodeExporter commented 9 years ago
we may be able to improve on this but you will never get 100% accurate results. 
 The bottom line is that the data is coming from different sources.  We can 
only match with a certain level of accuracy.  One thing we might be able to do 
is if the episode doesn't hit as a match in the first round of checks, put it 
through another round with some other matching technique like soundex.  I'm 
open to ideas and or code improvements.

Original comment by chadopp@gmail.com on 29 Oct 2010 at 12:59

GoogleCodeExporter commented 9 years ago
Alec...I know you don't like the matchpercent but shows like Model Misbehaviour 
should have matched if you had matchpercent set to less than 95%.  The other 
one would require 75% or less which is too low.  Just curious what your setting 
is.
you can add something like this to episodes.php to see how things are mathing

            similar_text($match, $key, $p);
print "$key - $match - $p<BR>";
            if ($p >= $matchPercent) return TRUE;

Original comment by chadopp@gmail.com on 29 Oct 2010 at 4:21

GoogleCodeExporter commented 9 years ago
Chris: Soundex like a good idea! (see what I did there? :D) I'm not sure what 
the processing difference is between soundex and similar_text, but I might do 
some tests. 

Chad: Thats what I thought, and as I noticed (see my comment above) that it HAS 
matched, but is ALSO in the unmatched bit at the bottom? This may be an error 
with Chris's new code? My current setting is 87% to overcome the "Girlfriend" 
issue I had previously, so not VERY high.

Original comment by alec.chr...@gmail.com on 29 Oct 2010 at 7:57

GoogleCodeExporter commented 9 years ago

Original comment by chadopp@gmail.com on 31 Oct 2010 at 4:45

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Chad and Alec,
Please ignore my previous posts.  I've done some research and things aren't 
working as expected.  My date match code is failing for some reason.  I'll let 
you know when I figure it out.
Thanks.
Chris.

Original comment by chris.k...@gmail.com on 31 Oct 2010 at 2:07

GoogleCodeExporter commented 9 years ago
Alec, 
Just curious, what revision are you running?  

Original comment by chris.k...@gmail.com on 31 Oct 2010 at 7:32

GoogleCodeExporter commented 9 years ago
Right, I think I was on 248 when I reported this. I've moved to 260 and still 
get it - it seems to look like most (8 out of 9) of the scheduled recordings 
are both matched and "unmatched" - possibly the error you identified in comment 
10.

As for the censored episode, its been recorded now, and matches as "Watched".

Original comment by alec.chr...@gmail.com on 1 Nov 2010 at 7:49

GoogleCodeExporter commented 9 years ago
OK, I take back that last comment, I just realised that the new config file was 
being used and that had my match-% at 85% - setting it back to 87% as it was 
before the censored one doesn't match any more.

Original comment by alec.chr...@gmail.com on 1 Nov 2010 at 8:03

GoogleCodeExporter commented 9 years ago
Alec.. Of the ones that are showing as "unmatched, but scheduled", what is the 
status of the first entry/entries?

Also, could you provide a few examples of the shows?  I'm having trouble 
duplicating this.

Original comment by chris.k...@gmail.com on 1 Nov 2010 at 8:23

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I've attached the HTML from the page - its Family Guy. American Dad also has it 
(also attached)

Original comment by alec.chr...@gmail.com on 1 Nov 2010 at 8:40

Attachments:

GoogleCodeExporter commented 9 years ago
Alec,  I'm hoping you wouldn't mind performing a test.

In the file tmpl/default/episode.php at line 305, it currently reads:
$schedEpisodesDetails["$datalc"]["matched"] = true;

can you change it to:
$schedEpisodesDetails[$datalc]["matched"] = true;

simply removed the quotes.
Thanks.
Chris.

Original comment by chris.k...@gmail.com on 1 Nov 2010 at 9:00

GoogleCodeExporter commented 9 years ago
:( No difference

Original comment by alec.chr...@gmail.com on 1 Nov 2010 at 9:13

GoogleCodeExporter commented 9 years ago
Alec,
Thank you for trying.  I believe the code kept thinking it had a date match.  
I've had a few minutes to try and create your scenario.  Can you try the 
following, it starts at line 300 in the tmpl/default/episode.php:

        }elseif (($schedMatch = ($schedMatchDate = in_array("$data[1]", $schedDate))) ||
            ($schedMatch = close_match("$datalc", $schedEpisodes, $matchPercent))) {
            if($schedMatchDate) {
                $schedEpisodesDetails[$schedEpisodes[array_search("$data[1]", $schedDate)]]["matched"] = true;
            } else {
                $schedEpisodesDetails["$datalc"]["matched"] = true;
            }

Original comment by chris.k...@gmail.com on 1 Nov 2010 at 9:48

GoogleCodeExporter commented 9 years ago
Cool! That stops the items appearing at the bottom in the "Scheduled but 
unmatched" section, but keeps them matched. Nice one Chris. Patch attached.

Original comment by alec.chr...@gmail.com on 2 Nov 2010 at 7:59

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by chadopp@gmail.com on 3 Nov 2010 at 2:35