This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
2.03k
stars
1.19k
forks
source link
DocumentAI: Malformed tables in markdown outputs #29071
Describe the bug
When selecting queryParameters: { outputContentFormat: "markdown" } when more complex tables are rendered, it is very prone to formatting errors. This causes broken markdown tables or sometimes misaligned columns.
I would suggest that by default it should render the tables in HTML instead of markdown table syntax, markdown will still render HTML tables. Services like Unstructured default to rendering tables in HTML for accuracy.
To Reproduce
Steps to reproduce the behavior:
CEC sample.pdf
Expected behavior
The tables should be rendered correctly in markdown. The API in does output the table correctly but the markdown render is really badly formatted. This makes the markdown output pretty useless for this scenario. More and more people are using DocumentAI for RAG ingestion and having the output in proper markdown is very useful.
**Example of a broken table (Markdown doesn't support colspan)
Markdown Render:
hots**
Table Render in HTML:
**Example where the markdown table causes misaligned columns:
Markdown render:
Describe the bug When selecting
queryParameters: { outputContentFormat: "markdown" }
when more complex tables are rendered, it is very prone to formatting errors. This causes broken markdown tables or sometimes misaligned columns.I would suggest that by default it should render the tables in HTML instead of markdown table syntax, markdown will still render HTML tables. Services like Unstructured default to rendering tables in HTML for accuracy.
To Reproduce Steps to reproduce the behavior: CEC sample.pdf
Expected behavior The tables should be rendered correctly in markdown. The API in does output the table correctly but the markdown render is really badly formatted. This makes the markdown output pretty useless for this scenario. More and more people are using DocumentAI for RAG ingestion and having the output in proper markdown is very useful.
**Example of a broken table (Markdown doesn't support colspan) Markdown Render:
hots** Table Render in HTML:
**Example where the markdown table causes misaligned columns: Markdown render:
Table Render in HTML: