Open xrlexpert opened 3 weeks ago
To address the issue of increasing memory usage, I optimized the loading of image data as follows:
Extracting Image Files: I first extracted the images.tar.gz
file to ensure all image files are available. After extraction, the image files are stored in a specified directory.
tar -xzvf images.tar.gz -C <image_directory>
Loading Data: I used pandas to load the Parquet file(ref-l4-test.parquet, ref-l4-val.parquet) containing the image information.
df = pd.read_parquet('<parquet_file_path>')
print(df.size)
Iterating Over the DataFrame : By iterating over each row of the DataFrame, I extracted relevant image information, including id, file_name, and caption.
for index, row in df.iterrows():
info = row.to_dict()
id = info['id']
file_name = info['file_name']
caption = info['caption']
image_path = "<image_directory>/" + file_name
image_source, image = load_image(image_path)
By implementing this approach, I successfully solve this problem and memory becomes stable
I am experiencing an issue with memory consumption while iterating through Ref_L4 dataset in PyTorch. After loading the dataset, I notice that the system memory usage keeps increasing with each iteration through the dataset, eventually leading to an out-of-memory error. My Ubuntu 22.04 system has 16GB of RAM. Steps to Reproduce