MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.31k stars 21.49k forks source link

Azure AI Search - Custom Web API skill - Web Api response contains a record with a missing id #124952

Open DeepikashreePrakash opened 5 days ago

DeepikashreePrakash commented 5 days ago

ISSUE

Approach:

RAG approach

Area of issue:

Azure AI Search -> Skillsets -> Custom Web API skill

Process:

I am trying to create a Custom Web API skillset that is capable of identifying the tables in the document and converts the table data into JSON format and send it into other skillsets to perform chunking and vectorizing it.

Steps:

  1. Generated a custom code that identifies the table structure and converts the table data into JSON as a Quart app
  2. Deployed the custom code in Azure Web app service
  3. Create a skillset to access and process the "Web app deployed"

Issue Explanation

Error message on running the indexer:

Operation - Web Api response contains a record with a missing id. Message - Could not execute skill because Web Api skill response is invalid.

Screenshot:

image

Skillset used:

{
  "name": "table-extraction-test-index-skillset",
  "description": "Skillset to chunk documents and generate embeddings",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
      "name": "#1",
      "description": "Convert table data into JSON",
      "context": "/document",
      "inputs": [
        {
          "name": "file_name",
          "source": "/document/metadata_storage_name",
          "inputs": []
        }
      ],
      "outputs": [
        {
          "name": "tableItems",
          "targetName": "tabledata"
        }
      ],
      "uri": "https://<webappname>.azurewebsites.net/api/custom_skill/",
      "httpHeaders": {},
      "httpMethod": "POST",
      "timeout": "PT3M",
      "batchSize": 123,
      "degreeOfParallelism": 1
    }
  ]
}

Sample Input:

{
  "values": [
    {
      "recordId": "1",
      "data": {
        "file_name": "sample_filename.pptx"
      }
    }
  ]
}

Sample Output:

{
    "values": [
        {
            "data": {
                "items": [
                    [
                        {
                            "Header1": "Row1-Value1",
                            "Header2": "Row1-Value2"
                        },
                        {
                            "Header1": "Row2-Value1",
                            "Header2": "Row2-Value2"
                        }
                    ],
                    [
                        {
                            "Header1": "Row1-Value1",
                            "Header2": "Row1-Value2",
                            "Header3": "Row1-Value3",
                            "Header4": "Row1-Value4"
                        },
                        {
                            "Header1": "Row2-Value1",
                            "Header2": "Row2-Value2",
                            "Header3": "Row2-Value3",
                            "Header4": "Row2-Value4"
                        },
                        ...
                        {
                            "Header1": "Row(n)-Value1",
                            "Header2": "Row(n)-Value2",
                            "Header3": "Row(n)-Value3",
                            "Header4": "Row(n)-Value4"
                        `}`
                    ]
                ]
            },
            "errors": null,
            "id": "1",
            "warnings": null
        }
    ]
}
PesalaPavan commented 4 days ago

@DeepikashreePrakash It would be great if you could add a link to the documentation you are following for these steps? This would help us redirect the issue to the appropriate team. Thanks!!

DeepikashreePrakash commented 4 days ago

Hi @PesalaPavan, I'm following the Microsoft documentation mentioned: https://learn.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface