Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.49k stars 4.82k forks source link

[BUG] Azure.AI.OpenAI 2.1.0.beta.1 - Deserialization issue on "Citations.URI" property of ChatMessageContext #46793

Open singhk97 opened 1 month ago

singhk97 commented 1 month ago

Library name and version

Azure.AI.OpenAI 2.1.0-beta.1

Describe the bug

When using Azure OpenAI On Your Data the returned citations object has a non-uri string in the uri field causing a deserialization error when doing ChatMessageContext? azureContext = chatCompletion.GetMessageContext(); where chatCompletion is of ChatCompletion type.

Expected behavior

The ChatMessageContext.ChatCitation.URI field should be set to null or the type be changed to string.

Actual behavior

This is the context object returned from the call to OYD:

{
    "citations": [
        {
            "content": 
            "Title: citrus.pdfPage 2....", 
            "title": "", 
            "url": "citrus.pdf",
            "filepath": "citrus.pdf", 
            "chunk_id": "2"
        }
    ]
}

notice the url field is not a valid uri string.

Reproduction Steps

Create your OYD data source from the Azure OpenAI portal using the Upload Files (preview) method:

Image

Environment

.NET SDK: Version: 9.0.100-rc.2.24474.11 Commit: 315e1305db Workload version: 9.0.100-manifests.4872d5d5 MSBuild version: 17.12.0-preview-24473-03+fea15fbd1

Runtime Environment: OS Name: Windows OS Version: 10.0.22631 OS Platform: Windows RID: win-x64 Base Path: C:\Program Files\dotnet\sdk\9.0.100-rc.2.24474.11\

.NET workloads installed: [aspire] Installation Source: VS 17.12.35417.141, VS 17.11.35327.3 Manifest Version: 8.2.0/8.0.100 Manifest Path: C:\Program Files\dotnet\sdk-manifests\8.0.100\microsoft.net.sdk.aspire\8.2.0\WorkloadManifest.json Install Type: Msi

Configured to use loose manifests when installing new manifests.

Host: Version: 9.0.0-rc.2.24473.5 Architecture: x64 Commit: 990ebf52fc

.NET SDKs installed: 8.0.403 [C:\Program Files\dotnet\sdk] 9.0.100-rc.2.24474.11 [C:\Program Files\dotnet\sdk]

.NET runtimes installed: Microsoft.AspNetCore.App 6.0.35 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 8.0.10 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 9.0.0-rc.2.24474.3 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 6.0.35 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 8.0.10 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 9.0.0-rc.2.24473.5 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 6.0.35 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 8.0.10 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 9.0.0-rc.2.24474.4 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

Other architectures found: x86 [C:\Program Files (x86)\dotnet] registered at [HKLM\SOFTWARE\dotnet\Setup\InstalledVersions\x86\InstallLocation]

Environment variables: Not set

global.json file: Not found

github-actions[bot] commented 1 month ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jpalvarezl @ralph-msft @trrwilson.

trrwilson commented 1 month ago

@singhk97, thank you for filing this issue.

When uploading files to create the new index, what kind of files/data are you providing? As we apply this fix, I'd like to ensure the tests exercise this behavior -- unfortunately, all of our test indices in place right now appear to always provide storage blob URIs (well-formed according to RFC and the Uri constructor) in their citations. Understanding the kinds of inputs going into index will help make (and keep) the fix complete.

singhk97 commented 1 month ago

They're pdf files:

  "citations": [
      {
          "content": "Title: citrus.pdfPage 2....", 
          "title": "", 
          "url": "citrus.pdf",
          "filepath": "citrus.pdf", 
          "chunk_id": "2"
      }
  ]