Azure / azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
MIT License
4.52k stars 2.76k forks source link

[Text analytics] Warning about document length inconsistent with API service documentation #27407

Open justinqquall opened 1 year ago

justinqquall commented 1 year ago

The text analytics endpoint indicates that the max size per document is 30,720 characters however when submitting a document considerably smaller, a warning is received despite smaller character count. See below traceback printing the value of the document and warning for the same AnalyzeHealthcareEntitiesResult object. See data limits documentation here: https://learn.microsoft.com/en-us/azure/cognitive-services/language-service/concepts/data-limits#maximum-characters-per-document

To Reproduce Steps to reproduce the behavior:

  1. Submit a document above 8000 characters to text analytics API

Expected behavior No warning is received.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context This happens both for the default text analytics model and when using model_version 2022-08-15-preview learn.microsoft.comlearn.microsoft.com Data limits for Language service features - Azure Cognitive Services Data and service limitations for Azure Cognitive Service for Language features. image

azure-sdk commented 1 year ago

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Cognitive - Text Analytics:0.54613066,Docs:0.21392056,Cognitive Services:0.044528954'

azure-sdk commented 1 year ago

Label prediction was below confidence level 0.6 for Model:ServiceLabels: 'Cognitive - Text Analytics:0.54613066,Docs:0.21392056,Cognitive Services:0.044528954'

kristapratico commented 1 year ago

Hey @justinqquall, my understanding is that the 30k char limit is the max you can send in a request, any more than that and the request will fail. From your screenshot, the request succeeds, but there is a warning. Warnings are usually returned to indicate that the quality of the model prediction may be affected due to some reason. @peytonfraser from the Language service team to confirm.

kristapratico commented 1 year ago

Adding @aurghob to confirm.

aurghob commented 1 year ago

Hi, Apologies for the delayed reply. We have this task in our backlog and will prioritize accordingly.