Closed ldorigo closed 5 years ago
Hi @ldorigo. You're not wrong in your assessment. In a major change to NLPre, we moved the backend to spaCy. Before, most of the processing was done in either pyparsing or pattern and in those cases, running the code in parallel worked well. What we've found with spaCy however is it runs fairly well out of the box (in parallel!) without needing to launch joblib. In fact, the overhead joblib creates (by pickling), creates a massive slowdown!
This is more of an issue with the docs, than it is with the code itself. Thank you for bring it to our attention and we will adjust accordingly.
Hi there,
Most likely this stems from me doing something wrong, but I am getting ~20x slower speeds when using multiple processes as shown in the readme.
Here's my code:
This runs at ~1.3 iterations per second according to TQDM. The equivalent non-concurrent code runs at around 20 iterations per second.