L-Dot / Letterboxd-list-scraper

A program that can scrape Letterboxd lists from an input URL. The output CSV or JSON contains information about the film title, release year, director, cast, personal rating, average rating and a lot more.
MIT License
41 stars 11 forks source link

Role scraping #13

Closed jonathanhouge closed 1 month ago

jonathanhouge commented 2 months ago

Hello once more!

For my school project, I wanted to scrape cast/crew roles as well and made it work... so I made it work here too! Let me know if there's anything you're unhappy about or want changed. Hope this is helpful. Thanks again, again, for providing this wonderful tool for all to use! :D

listscraper/ checkimport_functions.py: ROLES global variable (i think i found all of the valid ones?) new conditional on 'type' of list being scraped - "Cast/Crew" list_class.py: call 'scrape_list' with another argument, 'list.type' scrape_functions.py: have 'scrape_list' and 'scrape_page' take in 'list_type' --> string of what type of list is being scraped conditional on 'list_type' for 'table' generation (these pages have divs with class 'poster-grid') conditional on 'list_type' for roles that have less than four entries (these roles have placeholder posters that'll break the loop)

example_output/ csv/actor-anne-hathaway-films.csv & json/director-joel-coen-films.json: examples of role scraping, in both formats

L-Dot commented 1 month ago

Amazing stuff! I think this is very useful feature, happy to have it added to the project. Sounds like a very cool school project you have haha

Thanks also again for the clean integration into the existing code. I oversaw no issues, so I've merged it with the main branch 😄