@ming0701 reported 2 issues. We had a phone conversation regarding this on Thursday, Nov 18, 2021.
His script runs into an error trying to open the file. But since I can open the file without any struggle and I am using the same OS (MacOS) and CPU (M1), I suspect that it might be an issue specific to @ming0701's setup. FYI, the CSV file has around 80k rows and 22 columns. It is about 49MB on disk and Python reports it to be 14.4MB in memory before any wrangling. It is really not a large dataset;
The other issue is that I had some trouble with data wrangling and also there was a concern on whether we will get more errors when we attempt to transform and fit model with the data, and it was suggested that perhaps we should consider using a different dataset. My data wrangling issue was later resolved and it was not because the CSV file was untidy. Also, there is no guarantee that we can find another "cleaner" dataset given that when I worked on EDA, I do not find the movie dataset being used is particularly difficult to work with. And we agreed that we stick to this dataset while we are mindful of being realistic of what we can do with this course project of time constraints (point #1 above).
Regarding #1, I tested the script with other CSV files and it works fine. I would recommend checker to test the script on his laptop and l will make the appropriate changes if needed.
@ming0701 reported 2 issues. We had a phone conversation regarding this on Thursday, Nov 18, 2021.