This pull request includes several updates to the documentation and codebase to support a new file format (.nl2sql) and improve the chunking process. Key changes include updates to the README.md for better clarity and formatting, the addition of the NL2SQLChunker class, and enhancements to the chunk creation method to handle content truncation.
Documentation Improvements:
README.md: Improved link formatting and added a new section for .nl2sql files, detailing the NL2SQLChunker and its usage. [1][2][3][4]
Code Enhancements:
chunking/chunker_factory.py: Added support for .nl2sql files by importing NL2SQLChunker and updating the get_chunker method to return an instance of NL2SQLChunker for .nl2sql files. [1][2]
chunking/chunkers/base_chunker.py: Enhanced the _create_chunk method to handle content truncation based on a maximum byte size, ensuring that content fits within the allowed limit without breaking UTF-8 characters. [1][2]
chunking/chunkers/nl2sql_chunker.py: Introduced the NL2SQLChunker class to process and chunk JSON content containing natural language questions and corresponding SQL queries.
This pull request includes several updates to the documentation and codebase to support a new file format (
.nl2sql
) and improve the chunking process. Key changes include updates to theREADME.md
for better clarity and formatting, the addition of theNL2SQLChunker
class, and enhancements to the chunk creation method to handle content truncation.Documentation Improvements:
README.md
: Improved link formatting and added a new section for.nl2sql
files, detailing theNL2SQLChunker
and its usage. [1] [2] [3] [4]Code Enhancements:
chunking/chunker_factory.py
: Added support for.nl2sql
files by importingNL2SQLChunker
and updating theget_chunker
method to return an instance ofNL2SQLChunker
for.nl2sql
files. [1] [2]chunking/chunkers/base_chunker.py
: Enhanced the_create_chunk
method to handle content truncation based on a maximum byte size, ensuring that content fits within the allowed limit without breaking UTF-8 characters. [1] [2]chunking/chunkers/nl2sql_chunker.py
: Introduced theNL2SQLChunker
class to process and chunk JSON content containing natural language questions and corresponding SQL queries.