StegSchreck / RatS

Movie Ratings Synchronization with Python
GNU Affero General Public License v3.0
265 stars 31 forks source link

Add source www.filmtipset.se #97

Closed Row closed 3 years ago

Row commented 4 years ago

Add filmtipset.se as a source, later perhaps as destination.

Filmtipset.se has been one of Sweden's largest (and greatest) movie rating communities. But the site was recently relaunched by a new owner which did not know or cared what Filmtipset was about.

Here is some information about the site: https://sv.wikipedia.org/wiki/Filmtipset

Quote wikipedia (Lazy Google translate)

In November 2009, the Filmtipset had over 87,600 registered users and the database contained over 69,900 films and 18.7 million ratings. In July 2011, the Filmtipset had 103,400 users, 81,600 films and 23 million ratings. [18] In January 2017, the database contained over 120,000 registered users, 112,775 films and over 29 million ratings. In September 2019, the Filmtipset had 122,000 registered users, 123,500 films and 29.8 million ratings. [18]

Row commented 4 years ago

Some more information. Filmtipset.se has no API at the moment. I think the data needed for export is public? The ratings and rating date for each user can be found on an url below, where p is the pagination offset. https://www.filmtipset.se/betyg/ExampleUserName?p=0

IMDB-id might be present at each movie page e.g.: https://www.filmtipset.se/film/the-beach-bum

I might be able to help out with the request, but I need better information how to contribute. How to setup the development environment, preferably via docker. Are there any good commits or code to look at?

StegSchreck commented 4 years ago

The recommended dev environment would be a virtual env. There is a Dockfile present in the project though, that you can also use for local runs.

I would recommend to check out the other parsers and inserters to get a general idea. The last ones being implemented were about RottenTomatoes (see PR #94).

When a file download is offered, this would be preferred to increase the speed of parsing. Another option to have a look at is using Javascript calls from the selenium-controlled browser. This would avoid loading more data than actually needed. The remaining option would be web scraping using BeautifulSoup.