JierunChen / Ref-L4

Evaluation code for Ref-L4, a new REC benchmark in the LMM era
MIT License
18 stars 0 forks source link

Memory Consumption Increases During Dataset Iteration #3

Open xrlexpert opened 6 days ago

xrlexpert commented 6 days ago

I am experiencing an issue with memory consumption while iterating through Ref_L4 dataset in PyTorch. After loading the dataset, I notice that the system memory usage keeps increasing with each iteration through the dataset, eventually leading to an out-of-memory error. My Ubuntu 22.04 system has 16GB of RAM. Steps to Reproduce

  1. Initialize the dataset using my custom dataset class.
  2. Load the dataset.
  3. Use a for loop to iterate through the dataset and print info without any other operations. Screenshot 2024-11-01 203508 image
xrlexpert commented 6 days ago

Issue Follow-Up

To address the issue of increasing memory usage, I optimized the loading of image data as follows:

  1. Extracting Image Files: I first extracted the images.tar.gz file to ensure all image files are available. After extraction, the image files are stored in a specified directory.

    tar -xzvf images.tar.gz -C <image_directory>
  2. Loading Data: I used pandas to load the Parquet file(ref-l4-test.parquet, ref-l4-val.parquet) containing the image information.

    df = pd.read_parquet('<parquet_file_path>')
    print(df.size)
  3. Iterating Over the DataFrame : By iterating over each row of the DataFrame, I extracted relevant image information, including id, file_name, and caption.

    for index, row in df.iterrows():
      info = row.to_dict()
      id = info['id']
      file_name = info['file_name']
      caption = info['caption']
      image_path = "<image_directory>/" + file_name
      image_source, image = load_image(image_path)

    By implementing this approach, I successfully solve this problem and memory becomes stable image