RasmusKard / what.watch

Get a randomized movie or tv show suggestion based on your chosen parameters.
https://what.watch/
0 stars 0 forks source link

Hitting memory limit of Pythonanywhere for DB update script #34

Closed RasmusKard closed 2 months ago

RasmusKard commented 11 months ago

Have tried:

  1. Defining dtypes for pandas read_csv
  2. Making sure that no duplicate dataframes are being created during processing
  3. Converting from tsv to parquet before processing

Yet to try:

Easy way out: Take last tconst entry of SQL DB, only process IMDb dataset starting from that tconst to the end of the dataset. (Lowers data required to be processed significantly but would be reliant on the IMDb dataset staying in the same order)