Closed ashutosh486 closed 8 months ago
Resolved the issue by setting numpy cpu usage globally. Solution:
import os
os.environ["OMP_NUM_THREADS"] = "4"
os.environ["OPENBLAS_NUM_THREADS"] = "4"
os.environ["MKL_NUM_THREADS"] = "6"
os.environ["VECLIB_MAXIMUM_THREADS"] = "4"
os.environ["NUMEXPR_NUM_THREADS"] = "6"
Hi,
I have a dataset which has around 2million rows and each text is no more than 20 tokens. I tried building using SparseRetreiver
My disc space is around 14GB and RAM is around 96GB with 24 processors. is there any option to chunk the data and index it one chunk at a time?