ameerkat / imdb-to-sql

Converts the plain text IMDB files available for download into a usable relational database format
Apache License 2.0
104 stars 38 forks source link

Parsing correctly the movies #2

Open mbarrenag opened 11 years ago

mbarrenag commented 11 years ago

First of all thank you very much for your imdb-to-sql code. I am trying to use it in order to prepare some practices for teaching mysql in my lectures. I have been able to built the database according to the proposed db-schema, however I have seen that the movies are not always correctly parsed. Specifically the regular expression for a movie doesn't work properly.

Just to put an example for the line in the movies.list

"El informal" (1998) {(1998-07-13)} 1998

the output for the code block in the part processing movies:

        m = re.match(ParseRegexes.movies, line)
        #DEBUG
        print "groups: ", m.groups()

gives this result:

groups: ('El informal', '1998', '1998', None, None, None, None, None, None, None, None)

when it should be something similar to:

groups: ('El informal', '1998', '1998', None, None, None, None, 1998-07-13, None, None, None)

it is why the movie is not inserted into the series table as it shoudl be.

I have been trying to fix the movies = re.compile but it is really complex to me figure out how to do it correctly. I wonder if you could revise the "movies = re.compile(...)" in order to make it properly working.

Thank you very much.

Manuel Barrena.