Does cursor.fetchmany() load entire result set into memory?
I have a table with ~20GB that I need to process in my Python application.
# get cursor
cursor = self.conn.cursor()
# execute query
cursor.execute(self.query)
# column names to use in pandas dataframe
columns = self.columns_to_select
# downcasted data types for each column to be as memory-efficient as possible
types = self.column_data_types
df = pd.DataFrame(columns=columns).astype(types)
while True:
# get chunk of data
fetched_data: tuple = cursor.fetchmany(num=batch_size)
if not fetched_data:
break
# divide it into rows
rows: list[tuple] = [tuple(column for column in records) for records in fetched_data]
# create pandas dataframe from the rows
chunk_df = pd.DataFrame(rows, columns=columns).astype(types)
# append new batch
df = pd.concat([df, chunk_df], ignore_index=True)
logger.debug(
f"Fetched {i} chunk of data from the database",
df_size_in_mb=df.memory_usage(deep=True).sum() / (1024**2),
)
logger.debug(f"Result dataframe dtypes:\n{df.dtypes}")
Before making a single iteration my application consumes up to 10 GB of RAM until it fails with OOM error
Question
Does
cursor.fetchmany()
load entire result set into memory?I have a table with ~20GB that I need to process in my Python application.
Before making a single iteration my application consumes up to 10 GB of RAM until it fails with OOM error
How is this possible? As far as I understand it must always fetch
batch_size
of rows and keep the rest server-side.It is not the case for Amazon Redshift?
Driver version
2.1.0
Redshift version
PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.63205
Client Operating System
Linux
Python version
3.11