Investigate pipe processing vs custom batch processing

ericjanto / lex

NLP tool for exhaustive and context-driven vocabulary acquisition

1 stars 0 forks source link

Closed ericjanto closed 1 year ago

ericjanto commented 1 year ago

ericjanto commented 1 year ago

The main things to notice are:

Yes, using nlp.pipe(batched_text_list) will have a slight performance boost (running all nlp-ing on the entire Max Havelaar corpus took ~1.10min as opposed to ~1.30min with my own custom batching). So worth doing at some point but not now
This means the major bottleneck is elsewhere. My suspicion: api calls to the database