L-Dot / Letterboxd-list-scraper

A program that can scrape Letterboxd lists from an input URL. The output CSV or JSON contains information about the film title, release year, director, cast, personal rating, average rating and a lot more.
MIT License
41 stars 11 forks source link

The CSV files generated from these lists are almost empty (more details inside) #12

Closed champorado86 closed 2 months ago

champorado86 commented 2 months ago

Hi, first off, hats off to you for creating this scraper. I'm trying to build a dataset from the Letterboxd users I follow and this has been a timesaver. I was just manually scraping LOL :)

I ran into an issue with this link and this link as both of them return almost empty CSV files. I've tried other links before with a similar option and they came out OK. The first one should have returned a CSV with 87 titles and the second one should return 145 titles. What got generated are 1kb CSV files with only 1 title each. I also noticed there was no notification of "Written to xxx-film.csv" either.

I'd like to understand what's causing the issues for these 2 particular links. Again, thank you for creating this scraper tool and I hope you have other development plans for this in the future.

L-Dot commented 2 months ago

Hi! Glad the project is of use to you 😄

I had a quick look and this bug seems to be a very specific case where the first film in the list is unreleased (e.g. https://letterboxd.com/film/furiosa-a-mad-max-saga/ and https://letterboxd.com/film/wicked-2024/ in your case). Because it is unreleased, there are no official ratings and the histogram stats page https://letterboxd.com/csi/film/{title}/rating-histogram/ is empty.

Scraping is done correctly, but the program crashes during writeout as no ½,★,★½, etc. columns were created for the unreleased film. Moreover, because the writeout function extracts the CSV/JSON header from the first film entry, this only happens when the unreleased film is the first entry in the list. Congrats on finding this very specific bug!

Anyway a clumsy oversight on my part, but I did not realise that you could add unreleased films to lists (TIL 😃). I have added a fix to the program and your lists should now scrape correctly.

L-Dot commented 2 months ago

I've added the fix in v2.1.0 so I'll close this issue for now.

champorado86 commented 2 months ago

Thank you! That was fast :)