CambioML / uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering

LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!
https://www.cambioml.com
Apache License 2.0
187 stars 56 forks source link

Long text spliter #200

Closed ZHIHANCHEN03 closed 8 months ago

ZHIHANCHEN03 commented 8 months ago

add auto_split_long_text parameter for TransformConfig and add using example notebook to use it.

CambioML commented 8 months ago

Please fix the build error https://github.com/CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering/actions/runs/8055792877/job/22003432324?pr=200

notion-workspace[bot] commented 8 months ago

[Uniflow] Auto split long text, process with llm, and unify the results