Only small English spaCy library is imported, negatively impacting transformation accuracy

EricPostMaster / fortune-cookie-movies

Web app that imagines how much cooler fortune cookies would be if the messages inside were movie plots. 🎬📽🍿🥠

MIT License

11 stars 4 forks source link

Only small English spaCy library is imported, negatively impacting transformation accuracy #4

Closed EricPostMaster closed 2 years ago

EricPostMaster commented 2 years ago

The current app only uses the en_core_web_sm library, but the trf library is what was originally used. The transformer model is too large for loading with the website (460 MB), so maybe we can just do all of the rule-based transformations ahead of time and load the completed movie synopses into the application.

EricPostMaster commented 2 years ago

Added trf to requirements.txt and loaded it into fortune_cookie.py. Performance has improved, but I wonder how it might differ with the en_core_web_lg (https://spacy.io/models/en#en_core_web_lg) model. According to the spaCy website, the trf model outperforms the lg model in various accuracy metrics, including part-of-speech tagging, sentence segmentation, and named entities. Maybe I'll try it if I find myself with decent wifi to download the 560MB model.