Azure / gpt-rag-ingestion

MIT License
61 stars 53 forks source link

Add Paragraph Roles to the Data Ingestion Process #74

Open placerda opened 5 months ago

placerda commented 5 months ago

Goal: Document Intelligence provides paragraph roles information like heading, we will use this information to create better chunks.

How it will work: [ X ] Change document intelligence results in HTML format, according to paragraph roles. [ X ] Combine results with tables before chunking. [ ] Update chunk logic, according to file format. [ ] Embedding for each chunk part

Impact: Get the chunk more meaningful when we generate it from a paragraph in doc intelligence response.