Closed drewli815 closed 3 years ago
The objective of our system is to predict the genres of some movies that we don't know their exact genres, by using some other features of those movies, like posters or overview. Then we recommend such predicted movies to different users according to the users preference table.
To build our system, our collaborated project mainly consists of three steps.
step1-use the text dataset to build users preference table and save the outputs file URL:https://texera.ics.uci.edu/workflow/151
step2-Make- build CNN model for image classification in pytorch environment and save the trained model URL:https://texera.ics.uci.edu/workflow/176
step3-predict test movies with trained model and recommend them to different users, then combine our results URL:https://texera.ics.uci.edu/workflow/164
Step 2: Text Classification URL: https://texera.ics.uci.edu/workflow/163
For my text classification model, I focused in the overview feature, which contained a brief description of the specific movie. I first performed some aggregation and combined the relevant dataframes into one. Using a Python UDF operator I performed some basic text cleaning, in order to make the overview column more adaptable for the machine learning model. Next, I wrote a script for the genre classification, using a logistic regression model. For my output table, I displayed the f1-score and precision of my model. One limitation I faced was the ability to transfer model into another workflow, which we settled by saving the model to disk.
Thanks for the awesome use case! We will archive this issue now. The workflows and data files are archived on texera account project
.
Using the production server, we are conducting a data science project in hopes of building a working movie recommendation system