🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
hi all I just found a better way to load large json files using ijson. Inside the mimicit_dataset.py, you can replace with the following code.
for cur_mimicit_path, cur_images_path, cur_train_config_path, cur_status in zip(
self.mimicit_paths, self.images_paths, self.train_config_paths, self.status_list
):
# Load the dataset
assert os.path.exists(cur_mimicit_path), f"Error: The local mimicit_path {cur_mimicit_path} not exists!"
with open(cur_mimicit_path, "rb") as f:
if self.dataset == {}:
self.dataset = orjson.loads(f.read())["data"]
else:
self.dataset.update(orjson.loads(f.read())["data"])
# Load the images
# if cur_images_path != "":
# check if file is larger than 100GB
# use ijson for large files
with open(cur_images_path, "rb") as f:
for key, value in ijson.kvitems(f, ""):
self.images[key] = value
# with open(cur_images_path, "rb") as f:
# if not self.images:
# self.images = orjson.loads(f.read())
# else:
# self.images.update(orjson.loads(f.read()))
We will update it later along with requirement.txt in main branch.
hi all I just found a better way to load large json files using
ijson
. Inside themimicit_dataset.py
, you can replace with the following code.We will update it later along with requirement.txt in main branch.