In the tools/preprocess_data.py there is a function process_json_files which will read json files, encode them and save them in .bin/.idx format. While this reads every json key and creats a MMapIndexedDatasetBuilder for every key, it does not call the finalize function for every key (since it is missing a for loop). This prevents the index from being created.
In the
tools/preprocess_data.py
there is a functionprocess_json_files
which will read json files, encode them and save them in .bin/.idx format. While this reads every json key and creats aMMapIndexedDatasetBuilder
for every key, it does not call thefinalize
function for every key (since it is missing a for loop). This prevents the index from being created.