Performance Issues with joblib

Hi there,

Most likely this stems from me doing something wrong, but I am getting ~20x slower speeds when using multiple processes as shown in the readme.

Here's my code:

from tqdm import tqdm
from nlpre import (
    decaps_text,
    titlecaps,
    dedash,
    unidecoder,
    token_replacement,
)
from joblib import Parallel, delayed

parsers = [
    dedash(),
    titlecaps(),
    decaps_text(),
    unidecoder(),
    token_replacement(),
]

def normalize_abstracts(abstracts: List[Abstract]):
    def pipeline(t):
        for p in parsers:
            # One of the parsers sometimes fails
            try:
                t = p(t)
            except: 
                pass
        return t

    # Make an explicit array out of abstract texts (to be sure the slowdown isn't caused by some weird sqlalchemy datastructure
    texts = [abstract.original_text for abstract in abstracts ]
    # Launch the preprocessing in parallel:
    with Parallel(-1) as MP:
        norm_texts = MP(delayed(pipeline)(t) for t in tqdm(texts))
    # Fill sqlalchemy objects with the preprocessed abstracts
    for index,abstract in enumerate(abstracts):
        abstract.normalized_text = norm_texts[index]

This runs at ~1.3 iterations per second according to TQDM. The equivalent non-concurrent code runs at around 20 iterations per second.

NIHOPA / NLPre

Performance Issues with joblib #112