danswer-ai / danswer

Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
https://docs.danswer.dev/
Other
9.77k stars 1.09k forks source link

Implement indexing of simple tables in Word files #1651

Open artmatsak opened 2 weeks ago

artmatsak commented 2 weeks ago

Right now, tables in Word files are completely skipped from indexing. This PR unwraps any simple tables (no nested tables, no omitted cells) on a row-by-row basis to include them with the indexed text. Some assumptions are made along the way:

  1. The first row of the table is the heading
  2. Entity attributes are in the table columns, with each table row representing an isolated entity.

Each unwrapped table row may look as follows:

No.: 2
Issue: The CD doesn’t play
Comments: The CD doesn’t start playback upon insertion into the drive. Furthermore, the drive LED doesn’t turn on.
vercel[bot] commented 2 weeks ago

@artmatsak is attempting to deploy a commit to the Danswer Team on Vercel.

A member of the Team first needs to authorize it.