Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.3k stars 1.96k forks source link

DocumentIntelligence 1099r Prebuilt Tax Model box 14 value contains unparsed currency amount #41217

Open bjamin5 opened 1 month ago

bjamin5 commented 1 month ago

Describe the bug When using the Prebuilt 1099r Tax Document Intelligence model (Api Version 2024-02-29 Preview) box 14 returns the value as a String with the $ sign included in the value while the other fields have some post processing to extract the value from the string. Basically It returns me a string with the '$' when it should return me an number without the $. See the example pictures below.

To Reproduce Using the Document Inteligence studio or via sdk upload the sample 1099r provided in the DI studio. Api Version 2024-02-29 Preview Prebuilt Us Tax 1099 Model 1099-R

Expected behavior Box 14 should return me a Value that's of type number (Double) not a string since all of the other boxes on the form return a number in the 'value' key-value pair. Just this box doesn't remove the '$' from the amount and it returns it to me a string instead of a double. Proper behavior is that of box 16, 17, or 19 where the value is the parsed number from the content. Not a big deal to compensate but consistency would be nice.

Screenshots Current Wrong Behavior image Should be {'Content': '$123,000', 'Value': 123000}

Expected Behavior image

I'm using the Java SDK version: com.azure:azure-ai-documentintelligence:1.0.0-beta.3 To replicate the problem I used the Document Intelligence studio and the default sample 1099-R.

bjamin5 commented 1 month ago

In addition to the previous problem it appears that even with an empty box, boxes 14 and 17 return the '$' symbol as the value when it should return nothing. image

joshfree commented 1 month ago

@samvaity

samvaity commented 1 month ago

@bjamin5 Thank you for reporting this. I can confirm I am seeing the same results.

@bojunehsu Could you look into this, seems like the service is returning inconsistent results for similar data extraction.

bojunehsu commented 1 month ago

Thank you @bjamin5 for reporting this bug. Some of the fields in this prebuilt model are incorrectly tagged as string instead of number. Thus, the expected normalization was not applied. We will be fix this in a future release. Thanks again for the feedback.