This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
MIT License
1.98k
stars
1.15k
forks
source link
`DocumentAnalysisClient` and `ClassifyDocumentOptions` are inconsistent over different client SDKs #30040
This seems very weird, as the UI within Azure Document Intelligence Studio offers the possibility to define a range of pages to be analysed/classified when testing the models. This made me look into the network tab, and indeed a pages query parameter is passed to the classify endpoint (POST /documentintelligence/documentClassifiers/{modelid}:analyze). For now, I'm using this in my code with success, but it would be nice to know what is happening here and if this parameter will be completely deprecated in the future. In our situation, we classify documents based on the first page only, which works fine and fast. The solution can be found below and works (for now).
To Reproduce
Steps to reproduce the behavior:
Try passing pages parameter via JS SDK; it will not be used in the underlying call to the analyze endpoint and is not recognized by the TS type of ClassifyDocumentOptions
Expected behavior
Consistent behaviour and implementation across SDKs and REST endpoints
Describe the bug The documentation and implementation are unclear for
DocumentAnalysisClient
andClassifyDocumentOptions
.When working in a TypeScript environment on NodeJS, it seems impossible to pass the
pages
query parameter to thebeginClassifyDocument(FromUrl)
function. However, when looking into the latest docs for the Python SDK, this is mentioned to be possible (https://learn.microsoft.com/en-us/python/api/azure-ai-formrecognizer/azure.ai.formrecognizer.documentanalysisclient?view=azure-python#azure-ai-formrecognizer-documentanalysisclient-begin-classify-document-from-url). At the same time, none of these options are mentioned in the JS docs, or even in the REST specification (https://learn.microsoft.com/en-us/rest/api/aiservices/document-classifiers/classify-document?view=rest-aiservices-v4.0%20(2024-02-29-preview)&tabs=HTTP).This seems very weird, as the UI within Azure Document Intelligence Studio offers the possibility to define a range of pages to be analysed/classified when testing the models. This made me look into the network tab, and indeed a
pages
query parameter is passed to the classify endpoint (POST /documentintelligence/documentClassifiers/{modelid}:analyze
). For now, I'm using this in my code with success, but it would be nice to know what is happening here and if this parameter will be completely deprecated in the future. In our situation, we classify documents based on the first page only, which works fine and fast. The solution can be found below and works (for now).To Reproduce Steps to reproduce the behavior:
pages
parameter via JS SDK; it will not be used in the underlying call to the analyze endpoint and is not recognized by the TS type ofClassifyDocumentOptions
Expected behavior Consistent behaviour and implementation across SDKs and REST endpoints
Additional context The solution we worked out: