Closed yikuan8 closed 3 years ago
You're right that the preprocessing script takes a while to run. The code could definitely be sped up (e.g. through map reduce or multiprocessing). I haven't experimented with this myself, but it looks like spacy has a nice example of how to use it with joblib.
If you end up speeding this up, let us know and we'll incorporate into the repo.
Thanks for the great repo. I tested the preprocessing script. It will process 100 notes every minute, which leads to a total ETA of 15 days. Any idea of expediting this or you spent a similar amount of time?