Closed GoogleCodeExporter closed 8 years ago
THis seemed a little vague so I'll clarify. When the video file has multiple
parts,
it needs a multi-part subtitle file as well. Periscope simple gets the best math
subtitle and it seems that it doesn't take the multi-part criteria into
consideration. :)
Thanks Patrick. :)
Original comment by mridanga...@gmail.com
on 11 Mar 2010 at 12:09
Does it return any subtitles at all ?
If so, is it twice the same subtitle for both files or different ones ?
Could you also run the script with --debug and copy/paste the result in this
report (or
by mail : admin _AT_ getmesubs.com ) ?
Thanks
Original comment by patrick....@gmail.com
on 11 Mar 2010 at 12:24
Hi Patrick,
I believe I've find the problem. Periscope seems to take the language as a
parameter
which is an important criteria but number of subtitle parts is equally as
important.
In most plug-ins you seem to discard the result row if the language is not
supported.
I've added to this and put another criteria which discards the row if the
number of
parts criteria is not met. I've modified two of your files. Please have a look
and do
a diff with your head-revision copy and if it looks okay, you could merge them
into
the repo.
There was another problem that i encountered. You're using .startswith(token)
criteria in many places to accept/discard the result. This function is case
sensitive
and therefore I've converted both strings to upper-case before comparing. This
provided more flexibility but doesn't hinder the cleanliness of the result-set.
Should I modify the others too?
Thanks
Original comment by mridanga...@gmail.com
on 12 Mar 2010 at 8:09
Attachments:
Once again, I omitted some information accidentally. I'm using Periscope from
the
WIndows command line but I've been through your code and it doesn't seem to be
a problem.
Anyway, I feel that the main program periscope.py should have an over-loaded
function
which when given a file name, it looks in the directory to find other movie
files
with a similar name and calculates the number of parts that the movie is split
into
and this 'parts' value is passes to the plug-ins as a parameter. This will
greatly
increase the quality of the returned subtitle. A two-part subtitle for a
one-part
movie or vice-versa is quite useless, really. :)
Original comment by mridanga...@gmail.com
on 12 Mar 2010 at 8:15
I'll take a look at your modification in the week-end. The case sensitive
argument is
a valid one.
About your second suggestion, it is cannot be directly implemented this way, as
periscope could also be used to download TV show subtitles. If you have:
/Heroes.Season6
+-----Heroes.S06E01.avi
+-----Heroes.S06E02.avi
You can't just search on "Heroes" being the longest common string between the
two.
But there is a regex in the code that splits tv shows and movies into bits
(name,
season, episode, year, release team, ...) I will add a "part" bit for CD1, CD2
and
use that info in the searches.
I've also considered using the .nfo files that sometimes comes with movies in
order
to get extra info and better results. I still have to find a good nfo parser in
python (as I don't really want to write one)
Original comment by patrick....@gmail.com
on 12 Mar 2010 at 8:29
Regarding the case-sensitivity: sounds good!
As for the multi-part stuff:
I'm writing an open-source command-line app in which I'm using your library.
Which it
does is take a directory as an argument and a flag indicating whether the
content in
the directory is a movie or a TV-show. I'll get back to why i need this flag to
passed, later. Anyway, my code — dubbed as Janitor — does a check-sum on
all the
files, uncompress all the RAR and ZIP archives, cleans the name which means a
crappy
name like District.9.REPACK.R5.LiNE.XviD-KAMERA woild turn into District 9
(2009).
How do I do this name cleaning?
Well, I query all the three search engines (Yahoo, Bing, Google) with the
directory
which almost always the name of the movie. I concatenate the result from the
three
searches and get the highest occurring IMDB id from it. (Most torrent pages
have the
IMDB id somewhere on the page) I have written a procedure to clean the file-name
which strips out a bunch of commonly found keywords like XVid, DivX, R5 and
then uses
the partly cleaned file name to search the three engines, again. And i find the
highest occurring IMDB id. After these 6 searches (three before cleaning and
three
after cleaning) I usually end up with the correct name unless the original name
was
really FUBAR. I've thought of reading NFO files to look for IMDB ids which
might save
me the trouble of searching online. NFO files are text file and can be read in
Python
like any other plain-text file. The problem happens when the NFO file is not
included, which happens quite often. I haven't come across any program which can
automatically figure out whether it's a TV show or a Movie. :( ...and now I see
your
problem with the multi-part logic
..but.
Please could you have a look at my code and try to add it to the program. I've
written it in such a way that when a part argument is not passed when invoking
plugin.query(), the method behaves normally and it doesn't affect anything. An
overloaded function in persicope.py would make Periscope more flexible and easy
to
use in other projects such as mine. Since I haven't found a way to check
whether it's
a movie or a tv-show, I'm going to make the user explicitly specify the mode.
Maybe some of the information above might help you.
Original comment by mridanga...@gmail.com
on 12 Mar 2010 at 9:16
Yes, I will take a look at your code, and probably will use some of it. I just
may
implement it differently.
Currently, I do have a few regex to try to distinguish a file as a TV Show or a
Movie. It is done in the parent class SubtitleDatabase:
tvshowRegex =
re.compile('(?P<show>.*)S(?P<season>[0-9]{2})E(?P<episode>[0-9]{2}).(?
P<teams>.*)', re.IGNORECASE)
tvshowRegex2 =
re.compile('(?P<show>.*).(?P<season>[0-9]{2})x(?P<episode>[0-9]{2}).(?
P<teams>.*)', re.IGNORECASE)
movieRegex = re.compile('(?P<movie>.*)[\.|\[|\(| ]{1}(?P<year>(?:(?:19|20)[0-
9]{2}))(?P<teams>.*)', re.IGNORECASE)
It is not perfect for movies but it mainly exists to get better results for TV
shows.
I will add there the "CD" part to get the extra information about movie parts.
I will look around torrent websites and look for multipart release names for
testing
the regex and the plugins afterwards.
One of the goal of periscope is to be included in other projects, so I will try
to
make my best to make periscope useful for your project. And if you already have
a
webpage somewhere, I would be more than happy to link to it from the periscope
page.
Original comment by patrick....@gmail.com
on 12 Mar 2010 at 10:11
Hi Patrick.
Sorry for delay in reply; I was visiting Stockholm.
I'll be placing my project on Google Code but my project is little held up dude
the
issue I pointed out earlier. I'll be nice you could add a link (which I'll pass
to
you) and I'll do the same. I'm using using Persicope as a library and am trying
to
avoid touching your source at all.
The method you mentioned about finding out whether a file is a TV or movie show
sounds pretty good. It's impossible to make it a 100% fail proof but it sorta
does
the trick.
Once you've modified the function to accept a parts parameter, I could continue
working on my project. Feel free to implement it in any way you deem fit. Any
idea,
as to when you're planning to get this up and running? I could help if you wish.
If you'd like to test with some file-names, here's massive dump of file-names:
http://www.kickasstorrents.com/api/
Thank you.
:)
Original comment by mridanga...@gmail.com
on 15 Mar 2010 at 2:27
HI Patrick,
Any progress on this?
Thanks.
Original comment by mridanga...@gmail.com
on 23 Mar 2010 at 6:18
No progress on this yet. I've been busy with work. But in a code point of view,
the
query method will not be affected, the signature should stay the same.
You give the filename and language (the only info that cannot be guessed from
the
filename) to the query method and it will, based on that filename, make the
query :
extract the part info if needed and query the website based on whatever can be
extracted from the filename.
So if you give to the query method:
token=The.Hurt.Locker.2008.DVDRiP.XViD.CD1.avi
langs=['en','de']
It's gonna query the website using the langs info and the meta info extracted
from
the file name:
name=The.Hurt.Locker
year=2008
teams=DVDRiP.XViD
part=CD1
Original comment by patrick....@gmail.com
on 24 Mar 2010 at 8:10
I just added on the SVN trunk the support for multi-part (currently only
SubtitleSource takes thoses parameters into consideration, I'll have to see on
the
other website if the info is kept somewhere on the pages)
I also slightly improved the SubtitleSource code and deactivated Podnapisi
support as
they complexified their page. I'll have to fix the code for the plugin.
If you could give it a test and tell me if you have better results on your
multipart
files.
Next step, will be some clean up of the code as I have too much similar lines
of code
and the architecture of the plugins starts to suffer from that. And multi-part
support for more websites.
Original comment by patrick....@gmail.com
on 28 Mar 2010 at 10:59
Hi Patrick,
I'll do some testing. I'm little bust with work too. :(
Cheers.
Original comment by mridanga...@gmail.com
on 30 Mar 2010 at 12:37
Hi Patrick,
I've been looking at this and have been getting quite some errors. You seem to
be
using quite bit of logging. Do you know how I can pass a logger instance to your
class to that I can log the output to file?
Thanks.
Original comment by mridanga...@gmail.com
on 8 Apr 2010 at 6:00
I'm using the standard Python logging system. You can change the debug level
and the
output to a file:
http://docs.python.org/library/logging.html#simple-examples
Original comment by patrick....@gmail.com
on 8 Apr 2010 at 7:46
I figured out the logging issue.
Could you tell me as to why are many of the plugins in __init__.py commented
out?
Only SubtitleSource seems to be enabled.
Original comment by mridanga...@gmail.com
on 8 Apr 2010 at 9:12
I got errors with SubtitleSource.
False
False
False
Traceback (most recent call last):
File "C:\Documents and Settings\m.agarwalla\My Documents\My
Dropbox\Janitor\src\test\..\libs\periscope\plugins\SubtitleSource.py", line 49,
in
process
subs = self.query(fname, langs)
File "C:\Documents and Settings\m.agarwalla\My Documents\My
Dropbox\Janitor\src\test\..\libs\periscope\plugins\SubtitleSource.py", line 98,
in query
srtTeams = set(releaseMetaData['teams'])
KeyError: 'teams'
Original comment by mridanga...@gmail.com
on 8 Apr 2010 at 9:41
Hi Patrick, Have you had any luck with this?
Original comment by mridanga...@gmail.com
on 18 May 2010 at 11:33
You last error should be fixed in SVN.
I have support for multipart based on the filename (it looks for cd[number],
not
case-sensitive).
For SubtitleSource, I use that info to query the website.
It worked for the few multipart filenames I tried, but I don't guarantee that
it's
foolproof.
If you still have cases where it does not find any subtitles, I'll write a test
case
for them. But I have exams until the middle of June, so I may not be able to
fix the
code in the next few weeks.
Original comment by patrick....@gmail.com
on 18 May 2010 at 12:08
I'll close this, you can recontact me if you still find issues with multipart
support
Original comment by patrick....@gmail.com
on 7 Sep 2010 at 3:02
Original issue reported on code.google.com by
mridanga...@gmail.com
on 11 Mar 2010 at 11:38