Closed MarshtompCS closed 11 months ago
I have implemented this by passing a generator that iterates over multiple files for building index
Would you mind posting your solution for other people? Thank you.
Would you mind posting your solution for other people? Thank you.
Sure! The solution is like:
def many_files_line_iterator(files_list, callback=None):
for file in files_list:
open(file, "r") as fn:
for line in fn.readlines():
line = json.loads(line)
if callback:
yield callback(line)
else:
yield line
collection
files_list = ["path_to_jsonl_0", "path_to_jsonl_1"]
search_engine.index(many_files_line_iterator(files_list, callback=None))
When a corpus contains a very large quantity of documents, we usually split it into multiple files. I wonder if it is possible to support inputing a sequence of json/jsonl files to build index.