L-Dot / Letterboxd-list-scraper

A program that can scrape Letterboxd lists from an input URL. The output CSV or JSON contains information about the film title, release year, director, cast, personal rating, average rating and a lot more.
MIT License
41 stars 11 forks source link

multiple file types functionality #11

Closed jonathanhouge closed 2 months ago

jonathanhouge commented 3 months ago

Hello! I'm a college student and stumbled upon this repo. I thought it would be helpful for a project I'm doing, and it ended up being IMMENSELY helpful! However, I needed it to output a json file instead of a csv file. For my own project, I just changed everything related to csv to json, but thought that as a fun side-project and to show my appreciation, I would add the functionality to specify file types.

This pull request includes the ability to output either a csv or json file with the default being a csv. I think I've ensured that everything works as intended! I made a detailed commit message which you can see below to help in reviewing this pr, as well as example json outputs.

Let me know if there's anything you're unhappy about or want changed. Hope this is helpful. Thanks again for providing this wonderful tool for all to use! :D

cli.py: added new argument - "-ofe" or "--output_file_extension" str, default is ".csv" changed veriage from 'csv' to 'file'

main.py: added passing of new arg 'output_file_extension' to ListScraper creation added variable to end message to specify what kind of file type can be found in ./scraper_outputs/

checkimport_functions.py: new function 'checkimport_output_output_file_extension()' (called in instance_class, init) makes sure that '-ofe' passed in is either ".json" or ".csv", allows for exemption of comma added 'output_file_extension' parameter to 'checkimport_outputname()' to ensure name has the proper file extension

instance_class.py: new class field - 'output_file_extension' added import of 'checkimport_functions' and call to make sure the extension is valid passing of 'self.output_file_extension' to List creation modified 'scrape_all_and_writeout()' to output to json if ".json" (imported json for this) 'indent=4' makes for well formatted output 'ensure_ascii=False' makes sure special characters (including stars) aren't converted to unicode python's 'None' is converted to null with 'json.dumps' changed verbiage from 'csv' to 'file'

list_class.py: new class field - 'output_file_extension' 'self.output_file_extension' passing to 'checkimport_outputname()' 'self.output_file_extension' passing to 'scrape_list()' modified 'write_to_file()' to output to json if ".json" (imported json for this) 'indent=4' makes for well formatted output 'ensure_ascii=False' makes sure special characters (including stars) aren't converted to unicode python's 'None' is converted to null with 'json.dumps' changed verbiage from 'csv' to 'file'

scrape_functions.py: 'scrape_list()' and 'scrape_page()' take output_file_extension param now in 'space_page()', it's used to see what should be used for 'not_found', np.nan (import added) or None 'np.nan' switched with 'not_found'

utility_functions.py: removed numpy import (handled in scrape_functions.py) added parameter 'not_found' to functions json can't handle 'np.nan' - None instead!

README.md: changed verbiage from 'csv' to 'file' added sample flag that notifies specification of file extension modified 'TODO' to be "further options for output" instead of "add options for output"

example_output/csv & example_output/json & *.json files: created folders for file extensions, moving csv examples into the 'csv' folder made examples for json output

L-Dot commented 2 months ago

This is great! I'm very happy to read you were able to utilize the program for your project :) and thankful for the work you put into creating this extra functionality.

I've looked through your code and everything looks fine and clean. I also very much appreciate the refactoring of the inline comments and the README + adding an example json file to the repository! So thank you for that as well.

I've merged the it with the main branch!

jonathanhouge commented 2 months ago

I'm glad you're happy with it! Was my pleasure. :D