Closed svenseeberg closed 6 days ago
We can use chunking to work around the token limit:
def split_text(text, max_length=500):
sentences = text.split('.') "
chunks = []
current_chunk = ""
for sentence in sentences:
if not sentence.strip():
continue
sentence = sentence.strip() + "."
if len(current_chunk) + len(sentence) <= max_length:
current_chunk += sentence + " "
else:
chunks.append(current_chunk.strip())
current_chunk = sentence + " "
if current_chunk.strip():
chunks.append(current_chunk.strip())
return chunks
Replace the LLM translations with NLLB-200 3.3B model.
Fix #50