Getting Out of Memory Error

Hi,

I have a dataset which has around 2million rows and each text is no more than 20 tokens. I tried building using SparseRetreiver

from retriv import SparseRetriever

sr = SparseRetriever(
  index_name="bm25",
  model="bm25",
  min_df=1,
  tokenizer="whitespace",
  stemmer="english",
  stopwords="english",
  do_lowercasing=True,
  do_ampersand_normalization=True,
  do_special_chars_normalization=True,
  do_acronyms_normalization=True,
  do_punctuation_removal=False,
)
collections = [{"id": id, "text": text} for id, text in zip(ids, descs)]
sr.index(collections)

My disc space is around 14GB and RAM is around 96GB with 24 processors. is there any option to chunk the data and index it one chunk at a time?

AmenRa / retriv

Getting Out of Memory Error #36