Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.36k stars 2k forks source link

Document Intelligence exception when using Prebuilt tax document model tax.us.1099COMBO.2023 #41867

Open bjamin5 opened 2 months ago

bjamin5 commented 2 months ago

When calling the prebuilt model tax.us.1099COMBO.2023 for consolidated tax statements I'm getting this exception when calling SyncPoller.getFinalResult();

Exception

java.util.concurrent.FutureTask@6a5880a0[Completed exceptionally: java.io.UncheckedIOException: java.io.IOException: java.time.format.DateTimeParseException: Text 'Various' could not be parsed at index 0]

Stack Trace

getResultWithTimeout:480,

ImplUtils (com.azure.core.implementation) pollingLoop:70,

PollingUtil (com.azure.core.util.polling) MyClass that calls getFinalResult()

Code to Reproduce

ExponentialBackoffOptions exponentialBackoffOptions = new ExponentialBackoffOptions()
                    .setMaxRetries(20) 
                    .setBaseDelay(Duration.ofMillis(5))
                    .setMaxDelay(Duration.ofSeconds(20));

RetryOptions retryOptions = new RetryOptions(exponentialBackoffOptions);

client = new DocumentIntelligenceClientBuilder()
          .credential(new AzureKeyCredential(apiKey))
          .endpoint(endpoint)
          .serviceVersion(DocumentIntelligenceServiceVersion.V2024_07_31_PREVIEW)
          .retryOptions(retryOptions)
          .buildClient();

SyncPoller<AnalyzeResultOperation, AnalyzeResult> analyzeDocumentPoller = client.beginAnalyzeDocument(
                    "tax.us.1099COMBO.2023", 
                    null,
                    null,
                    null,
                    null,
                    null,
                    null,
                    null,
                    new AnalyzeDocumentRequest().setBase64Source(fileData)
            );
AnalyzeResult analyzeDocumentResult = analyzeDocumentPoller.getFinalResult();

Screenshots image image

Expected/desired behavior

It should poll until the AnalyzeResult object is returned.

Versions

JRE: liberica-21

azure sdk client: com.azure/azure-ai-documentintelligence/1.0.0-beta.4

Other Information

I've tried mulitiple examples and it seems to be just the 1099combo model with this bug. 1099Int and 1099div samples did not throw this exception when passed into this combo model. No problem seems to occur when using the Document Intelligence Studio in the browser.

Here is a pdf of a 1099-consolidated statement with fake information that causes this exception:

Standard.Consolidated.pdf

alzimmermsft commented 2 months ago

Thanks for filing this issue @bjamin5!

Taking a rough look at the PDF you've included, I wonder if there is a mix up with those "Date Acquired" fields with values "various".

@samvaity, @mssfang could you take deeper look into this whether this is an SDK bug.

samvaity commented 2 months ago

@bjamin5 I can confirm we are seeing the error on the SDK. It is due to the incorrect result returned from the service for "Box1b" where the type required is "date" but returned as string "Various". So in the SDK we fail here: valueDate = reader.getNullable(nonNullReader -> LocalDate.parse(nonNullReader.getString()));

image @bojunehsu: Could you take a look at the model returning incorrect type for fields from the service end?

@alzimmermsft: In my opinion, SDK throwing the parsing error is correct. Do you think we need to add better handling here?

bjamin5 commented 2 months ago

@samvaity @TFR258 Thanks for getting this triaged so quickly. Any update on a fix? The company I work for will be heavily leveraging all of Document Intelligence prebuilt tax models so hopefully I can help identify any bugs other models might have.

samvaity commented 2 months ago

@TFR258 and @bojunehsu Do we know if the fix is underway and could make it to the next release?

bjamin5 commented 1 month ago

image So "various" in the date acquired is actually a correct value and expected in some cases. If you change the type from "date" to "string" when 'various' is detected then there wouldn't be a problem and then users could handle it on their end by check the datatype before attempting to parse a date. Can you get this fixed soon? We'll be using this model for over a million uploads in the upcoming year and this currently a blocking issue for us. @samvaity @TFR258 @bojunehsu @alzimmermsft

bojunehsu commented 1 month ago

The expected behavior in this scenario is to not return anything in valueDate and "Various" in content. We will fix the behavior soon.

bjamin5 commented 1 month ago

Any update on when this fix will be made? This is still a blocking issue for us @bojunehsu

bjamin5 commented 1 month ago

@samvaity We need this fixed as soon as possible so we can test it's performance and integration before the year ends and tax filing volume increases. When will the next release be?

samvaity commented 1 month ago

We are expecting a service release in end of November. @bojunehsu can you confirm if this fix would be in that release?

bojunehsu commented 1 month ago

Yes, the fix is planned for the upcoming release in November.