Currently, for token-based metrics, we're potentially re-computing a lot of tokens for a sample.
Having the options to (in parallel) pre-compute tokens and then directly work on those would speed up the process by a significant portion.
This is not so trivial, since somehow spaCy starts mucking with too many samples and consumes a lot of main memory.
For safety, we have most of the processing with a single thread now.
Currently, for token-based metrics, we're potentially re-computing a lot of tokens for a sample. Having the options to (in parallel) pre-compute tokens and then directly work on those would speed up the process by a significant portion.