dominikmn / one-million-posts

Assisting newspaper moderators with machine learning.
MIT License
2 stars 1 forks source link

Issue 136 speed up inference #137

Closed dominikmn closed 3 years ago

dominikmn commented 3 years ago

Contains two fixes:

  1. The BertTokenizer() gets now instantiated globally with the startup of the backend rather than with every single prediction. This is achieved by moving BertTokenizer() from the get_prediction() method to the BinaryClassifier() class.
  2. Setting num_workers=0. So the data loading is performed in the main process now. In contrast to that, values > 0 cause costly creations of new sub-processes of which each requires a complete copy of the DataLoader() in particular. Anyway: For our particular scenario where the dashboard sends exactly one sample per user input, any value > 1 does not make sense at all.

Closes #136

dominikmn commented 3 years ago

Merging after internal discussion.