Multi-part movie support

GoogleCodeExporter commented 8 years ago

I have a directory structure like this:

..\The.Hurt.Locker.2008.DVDRiP.XViD
 |
 +-----The.Hurt.Locker.2008.DVDRiP.XViD.CD1.avi
 | 
 +-----The.Hurt.Locker.2008.DVDRiP.XViD.CD2.avi

It doesn't fetch the correct subtitle for multi-part video files. I can
help with this. Any idea as to how this could be resolved?

Original issue reported on code.google.com by mridanga...@gmail.com on 11 Mar 2010 at 11:38

GoogleCodeExporter commented 8 years ago

THis seemed a little vague so I'll clarify. When the video file has multiple 
parts,
it needs a multi-part subtitle file as well. Periscope simple gets the best math
subtitle and it seems that it doesn't take the  multi-part criteria into
consideration. :)
Thanks Patrick. :)

Original comment by mridanga...@gmail.com on 11 Mar 2010 at 12:09

GoogleCodeExporter commented 8 years ago

Does it return any subtitles at all ?
If so, is it twice the same subtitle for both files or different ones ?

Could you also run the script with --debug and copy/paste the result in this 
report (or 
by mail : admin _AT_ getmesubs.com ) ?

Thanks

Original comment by patrick....@gmail.com on 11 Mar 2010 at 12:24

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Hi Patrick,

I believe I've find the problem. Periscope seems to take the language as a 
parameter
which is an important criteria but number of subtitle parts is equally as 
important.
In most plug-ins you seem to discard the result row if the language is not 
supported.
I've added to this and put another criteria which discards the row if the 
number of
parts criteria is not met. I've modified two of your files. Please have a look 
and do
a diff with your head-revision copy and if it looks okay, you could merge them 
into
the repo. 

There was another problem that i encountered. You're using .startswith(token)
criteria in many places to accept/discard the result. This function is case 
sensitive
and therefore I've converted both strings to upper-case before comparing. This
provided more flexibility but doesn't hinder the cleanliness of the result-set.

Should I modify the others too?

Thanks

Original comment by mridanga...@gmail.com on 12 Mar 2010 at 8:09

Attachments:

GoogleCodeExporter commented 8 years ago

Once again,  I omitted some information accidentally. I'm using Periscope from 
the
WIndows command line but I've been through your code and it doesn't seem to be 
a problem.

Anyway, I feel that the main program periscope.py should have an over-loaded 
function
which when given a file name, it looks in the directory to find other movie 
files
with a similar name and calculates the number of parts that the movie is split 
into
and this 'parts' value is passes to the plug-ins as a parameter. This will 
greatly
increase the quality of the returned subtitle. A two-part subtitle for a 
one-part
movie or vice-versa is quite useless, really. :)

Original comment by mridanga...@gmail.com on 12 Mar 2010 at 8:15

GoogleCodeExporter commented 8 years ago

I'll take a look at your modification in the week-end. The case sensitive 
argument is 
a valid one.

About your second suggestion, it is cannot be directly implemented this way, as 
periscope could also be used to download TV show subtitles. If you have:
/Heroes.Season6
+-----Heroes.S06E01.avi
+-----Heroes.S06E02.avi

You can't just search on "Heroes" being the longest common string between the 
two.
But there is a regex in the code that splits tv shows and movies into bits 
(name, 
season, episode, year, release team, ...) I will add a "part" bit for CD1, CD2 
and 
use that info in the searches.

I've also considered using the .nfo files that sometimes comes with movies in 
order 
to get extra info and better results. I still have to find a good nfo parser in 
python (as I don't really want to write one)

Original comment by patrick....@gmail.com on 12 Mar 2010 at 8:29

GoogleCodeExporter commented 8 years ago

Regarding the case-sensitivity: sounds good!

As for the multi-part stuff:

I'm writing an open-source command-line app in which I'm using your library. 
Which it
does is take a directory as an argument and a flag indicating whether the 
content in
the directory is a movie or a TV-show. I'll get back to why i need this flag to
passed, later. Anyway, my code — dubbed as Janitor — does a check-sum on 
all the
files, uncompress all the RAR and ZIP archives, cleans the name which means a 
crappy
name like District.9.REPACK.R5.LiNE.XviD-KAMERA woild turn into District 9 
(2009).
How do I do this name cleaning?

Well, I query all the three search engines (Yahoo, Bing, Google) with the 
directory
which almost always the name of the movie. I concatenate the result from the 
three
searches and get the highest occurring IMDB id from it. (Most torrent pages 
have the
IMDB id somewhere on the page) I have written a procedure to clean the file-name
which strips out a bunch of commonly found keywords like XVid, DivX, R5 and 
then uses
the partly cleaned file name to search the three engines, again. And i find the
highest occurring IMDB id. After these 6 searches (three before cleaning and 
three
after cleaning) I usually end up with the correct name unless the original name 
was
really FUBAR. I've thought of reading NFO files to look for IMDB ids which 
might save
me the trouble of searching online. NFO files are text file and can be read in 
Python
like any other plain-text file. The problem happens when the NFO file is not
included, which happens quite often. I haven't come across any program which can
automatically figure out whether it's a TV show or a Movie. :( ...and now I see 
your
problem with the multi-part logic

..but.

Please could you have a look at my code and try to add it to the program. I've
written it in such a way that when a part argument is not passed when invoking
plugin.query(), the method behaves normally and it doesn't affect anything. An
overloaded function in persicope.py would make Periscope more flexible and easy 
to
use in other projects such as mine. Since I haven't found a way to check 
whether it's
a movie or a tv-show, I'm going to make the user explicitly specify the mode.

Maybe some of the information above might help you.

Original comment by mridanga...@gmail.com on 12 Mar 2010 at 9:16

GoogleCodeExporter commented 8 years ago

Yes, I will take a look at your code, and probably will use some of it. I just 
may 
implement it differently.
Currently, I do have a few regex to try to distinguish a file as a TV Show or a 
Movie. It is done in the parent class SubtitleDatabase:
tvshowRegex = 
re.compile('(?P<show>.*)S(?P<season>[0-9]{2})E(?P<episode>[0-9]{2}).(?
P<teams>.*)', re.IGNORECASE)
tvshowRegex2 = 
re.compile('(?P<show>.*).(?P<season>[0-9]{2})x(?P<episode>[0-9]{2}).(?
P<teams>.*)', re.IGNORECASE)
movieRegex = re.compile('(?P<movie>.*)[\.|\[|\(| ]{1}(?P<year>(?:(?:19|20)[0-
9]{2}))(?P<teams>.*)', re.IGNORECASE)

It is not perfect for movies but it mainly exists to get better results for TV 
shows. 
I will add there the "CD" part to get the extra information about movie parts.

I will look around torrent websites and look for multipart release names for 
testing 
the regex and the plugins afterwards.

One of the goal of periscope is to be included in other projects, so I will try 
to 
make my best to make periscope useful for your project. And if you already have 
a 
webpage somewhere, I would be more than happy to link to it from the periscope 
page.

Original comment by patrick....@gmail.com on 12 Mar 2010 at 10:11

GoogleCodeExporter commented 8 years ago

Hi Patrick.

Sorry for delay in reply; I was visiting Stockholm.

I'll be placing my project on Google Code but my project is little held up dude 
the
issue I pointed out earlier. I'll be nice you could add a link (which I'll pass 
to
you) and I'll do the same. I'm using using Persicope as a library and am trying 
to
avoid touching your source at all. 

The method you mentioned about finding out whether a file is a TV or movie show
sounds pretty good. It's impossible to make it a 100% fail proof but it sorta 
does
the trick.

Once you've modified the function to accept a parts parameter, I could continue
working on my project. Feel free to implement it in any way you deem fit. Any 
idea,
as to when you're planning to get this up and running? I could help if you wish.

If you'd like to test with some file-names, here's massive dump of file-names: 
http://www.kickasstorrents.com/api/

Thank you.

:)

Original comment by mridanga...@gmail.com on 15 Mar 2010 at 2:27

GoogleCodeExporter commented 8 years ago

HI Patrick,

Any progress on this?

Thanks.

Original comment by mridanga...@gmail.com on 23 Mar 2010 at 6:18

GoogleCodeExporter commented 8 years ago

No progress on this yet. I've been busy with work. But in a code point of view, 
the
query method will not be affected, the signature should stay the same.

You give the filename and language (the only info that cannot be guessed from 
the
filename) to the query method and it will, based on that filename, make the 
query :
extract the part info if needed and query the website based on whatever can be
extracted from the filename.

So if you give to the query method:
token=The.Hurt.Locker.2008.DVDRiP.XViD.CD1.avi
langs=['en','de']

It's gonna query the website using the langs info and the meta info extracted 
from
the file name:
name=The.Hurt.Locker
year=2008
teams=DVDRiP.XViD
part=CD1

Original comment by patrick....@gmail.com on 24 Mar 2010 at 8:10

GoogleCodeExporter commented 8 years ago

I just added on the SVN trunk the support for multi-part (currently only
SubtitleSource takes thoses parameters into consideration, I'll have to see on 
the
other website if the info is kept somewhere on the pages)

I also slightly improved the SubtitleSource code and deactivated Podnapisi 
support as
they complexified their page. I'll have to fix the code for the plugin.

If you could give it a test and tell me if you have better results on your 
multipart
files.

Next step, will be some clean up of the code as I have too much similar lines 
of code
and the architecture of the plugins starts to suffer from that. And multi-part
support for more websites.

Original comment by patrick....@gmail.com on 28 Mar 2010 at 10:59

Changed state: Started

GoogleCodeExporter commented 8 years ago

Hi Patrick,
I'll do some testing. I'm little bust with work too. :(
Cheers.

Original comment by mridanga...@gmail.com on 30 Mar 2010 at 12:37

GoogleCodeExporter commented 8 years ago

Hi Patrick,
I've been looking at this and have been getting quite some errors. You seem to 
be
using quite bit of logging. Do you know how I can pass a logger instance to your
class to that I can log the output to file?
Thanks.

Original comment by mridanga...@gmail.com on 8 Apr 2010 at 6:00

GoogleCodeExporter commented 8 years ago

I'm using the standard Python logging system. You can change the debug level 
and the 
output to a file:
http://docs.python.org/library/logging.html#simple-examples

Original comment by patrick....@gmail.com on 8 Apr 2010 at 7:46

GoogleCodeExporter commented 8 years ago

I figured out the logging issue.

Could you tell me as to why are many of the plugins in __init__.py commented 
out?
Only SubtitleSource seems to be enabled.

Original comment by mridanga...@gmail.com on 8 Apr 2010 at 9:12

GoogleCodeExporter commented 8 years ago

I got errors with SubtitleSource.

False
False
False
Traceback (most recent call last):
  File "C:\Documents and Settings\m.agarwalla\My Documents\My
Dropbox\Janitor\src\test\..\libs\periscope\plugins\SubtitleSource.py", line 49, 
in
process
    subs = self.query(fname, langs)
  File "C:\Documents and Settings\m.agarwalla\My Documents\My
Dropbox\Janitor\src\test\..\libs\periscope\plugins\SubtitleSource.py", line 98, 
in query
    srtTeams = set(releaseMetaData['teams'])
KeyError: 'teams'

Original comment by mridanga...@gmail.com on 8 Apr 2010 at 9:41

GoogleCodeExporter commented 8 years ago

Hi Patrick, Have you had any luck with this?

Original comment by mridanga...@gmail.com on 18 May 2010 at 11:33

GoogleCodeExporter commented 8 years ago

You last error should be fixed in SVN.
I have support for multipart based on the filename (it looks for cd[number], 
not 
case-sensitive).

For SubtitleSource, I use that info to query the website.

It worked for the few multipart filenames I tried, but I don't guarantee that 
it's 
foolproof.

If you still have cases where it does not find any subtitles, I'll write a test 
case 
for them. But I have exams until the middle of June, so I may not be able to 
fix the 
code in the next few weeks.

Original comment by patrick....@gmail.com on 18 May 2010 at 12:08

GoogleCodeExporter commented 8 years ago

I'll close this, you can recontact me if you still find issues with multipart 
support

Original comment by patrick....@gmail.com on 7 Sep 2010 at 3:02

Changed state: Fixed

abbi031892 / periscope

Multi-part movie support #22