Azure-Samples / graphrag-accelerator

One-click deploy of a Knowledge Graph powered RAG (GraphRAG) in Azure
https://github.com/microsoft/graphrag
MIT License
1.91k stars 315 forks source link

Querying an XML file with multiple entitiy sets and relationships fails #163

Open CumulusService opened 2 months ago

CumulusService commented 2 months ago

metadata.txt Describe the bug

I am not sure if this is a bug or an issue on my end however once succesfully deploying the accelerator, I uploaded an XML file to the Blob Storage in Azure that represents a metadata information of an ERP system(see attached) and then generfated the prompts via the exposed API method:

community_report.txt entity_extraction.txt summarize_descriptions.txt

The indexing process went well and had been completed however when I started querying the index(both local and global queries ) via the API I got totally unrelated results that had nothing to do with the indexed data. The returned JSON contained multiple properties and I couldn't tell where to look for the result. Running the solution locally yielded much better results

Can the GraphRAG handle complex XMLs? is XML parsing & cleaning needed first? building a Pandas DF around it ? Maybe I am using the wrong Prompt templates while indexing? Any recommendation would be helpful

The goal is that when I query about a certain entitiy ( say 'Orders' ) . then the Knowledge Graph should know that 'Orders' is of type 'Document' and be able to extract all the property names of 'Document' . same goes for 'Invoices' , 'PurhaseOrders' etc.