This pull request introduces new functionality to the SpreadsheetChunker class, allowing it to chunk spreadsheets by individual rows instead of entire sheets. Additionally, it includes changes to the chunking process to optionally include header rows in each chunk. The most important changes are detailed below.
New Functionality:
chunking/chunkers/spreadsheet_chunker.py: Added parameters chunking_by_row and include_header_in_chunks to the SpreadsheetChunker class, enabling row-wise chunking and optional inclusion of headers in each chunk.
local.settings.json.template: Introduced new environment variables SPREADSHEET_CHUNKING_BY_ROW and SPREADSHEET_CHUNKING_BY_ROW_INCLUDE_HEADER to control the new chunking behavior.
This pull request introduces new functionality to the
SpreadsheetChunker
class, allowing it to chunk spreadsheets by individual rows instead of entire sheets. Additionally, it includes changes to the chunking process to optionally include header rows in each chunk. The most important changes are detailed below.New Functionality:
chunking/chunkers/spreadsheet_chunker.py
: Added parameterschunking_by_row
andinclude_header_in_chunks
to theSpreadsheetChunker
class, enabling row-wise chunking and optional inclusion of headers in each chunk.chunking/chunkers/spreadsheet_chunker.py
: Updated theget_chunks
method to support chunking by row, including logic to handle headers and row data.Configuration:
local.settings.json.template
: Introduced new environment variablesSPREADSHEET_CHUNKING_BY_ROW
andSPREADSHEET_CHUNKING_BY_ROW_INCLUDE_HEADER
to control the new chunking behavior.