ai-learner-00 commented 8 months ago

Could you elaborate on how the composed model classifies documents? Initially, I wanted to train a classifier to classify documents and redirect them to the correct extraction/custom model. However, it seems that this isn't necessary as composed models do not need additional (classification) training.

I noticed that custom models have a confidence score. How can the model tell the probability that the document is of the correct type when it was trained on 1 type of document (without counter examples)?

Document Details

⚠ Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

ID: 3c157636-675b-c2dc-6ef9-dabd8af2c1b1
Version Independent ID: 7b832b39-5f85-024b-3a47-92dd966a49a7
Content: How to guide: create and compose custom models with Document Intelligence (formerly Form Recognizer) - Azure AI services
Content Source: articles/ai-services/document-intelligence/how-to-guides/compose-custom-models.md
Service: azure-ai-document-intelligence
GitHub Login: @laujan
Microsoft Alias: lajanuar

AjayBathini-MSFT commented 8 months ago

@ai-learner-00 Thanks for your feedback! We will investigate and update as appropriate.

ai-learner-00 commented 8 months ago

To confirm, do composed models work on pdfs that contain more than 1 type of form? (I am not able to reproduce this example to try it as the sample forms weren't made available.)

Composed models are useful when you've trained several models and want to group them to analyze similar form types.

Why do the forms need to be similar?

Naveenommi-MSFT commented 8 months ago

Hello @ai-learner-00 Composed models use a classification step to decide which custom model accurately represents the form presented for analysis. When a document is submitted to a composed model, the service performs a classification to determine which custom model to use for analysis and extraction. The classification is based on the similarity of the document to the forms that were used to train the custom models included in the composed model.

The confidence score for a custom model represents the probability that the model correctly identified the data in the document. This score is based on the similarity of the document to the forms that were used to train the model. If the document is very similar to the forms used to train the model, the confidence score will be high. If the document is less similar, the confidence score will be lower.

Composed models can work on PDFs that contain more than one type of form. When a document is submitted to a composed model, the service performs a classification to determine which custom model to use for analysis and extraction. If the document contains multiple types of forms, the service will classify the document based on the form type that is most similar to the forms used to train the custom models included in the composed model.

Forms need to be similar in order to be grouped together in a composed model because the classification step used by composed models is based on the similarity of the document to the forms used to train the custom models. If the forms are not similar, the classification step may not accurately identify the correct custom model to use for analysis and extraction.

ai-learner-00 commented 8 months ago

@Naveenommi-MSFT Oh I see, that makes sense. Thank you!

I am still a little confused about the point on similar forms. If form type A is very different from form type B, doesn't that make it easier to distinguish between the 2? (or did you mean that forms within a form type need to be similar)

Do you happen to have any insight/recommendation on whether composed models perform well and if it can still be worth it to train a classifier?

Naveenommi-MSFT commented 8 months ago

@ai-learner-00 To clarify, when we say "similar forms," we mean forms that have similar layouts or structures, regardless of their form type. For example, two forms that have similar layouts but are of different types (such as an invoice and a receipt) may be more difficult to distinguish from each other than two forms of the same type but with different layouts.

Regarding your question about composed models, they can perform well in certain scenarios, particularly when you have a large number of forms with varying layouts and structures. Composed models allow you to combine prebuilt models and custom models to create a more accurate and robust model that can handle a wider range of form types and layouts.

However, whether or not it is worth it to train a classifier depends on your specific use case and the complexity of your forms. If you have a small number of forms with simple layouts, a prebuilt model may be sufficient. But if you have a large number of forms with complex layouts, a custom model or composed model may be necessary to achieve the desired level of accuracy.

ai-learner-00 commented 8 months ago

@Naveenommi-MSFT It didn't register that a form type can have different layouts. In my head, a given form type (e.g. specific tax form) always has the same structure, but I guess the invoice model would need to handle all kinds of layouts while still looking for the same fields.

Forms need to be similar in order to be grouped together in a composed model

I think I understand now. Grouping similar forms means that all forms of type A are similar to each other and that all forms of type B are similar, but forms of type A and B are (ideally) different from each other.

The original quote from the documentation says "similar form types", so I was thinking that form type A and B are similar.

Composed models are useful when you've trained several models and want to group them to analyze similar form types.

Edit: Similarity is the cosine similarity between the vector embeddings, which take into account the text (content of the form), positional embeddings (structure) and image, of the documents correct? There was an emphasis on the structure, so just wanted to confirm the content of the form matters as well.

Naveenommi-MSFT commented 8 months ago

@ai-learner-00 . I've delegated this to content author @laujan, who will review it and offer their insightful opinions and update as appropriate.

Naveenommi-MSFT commented 8 months ago

@laujan Please review it and add comments on this, update as appropriate.

laujan commented 7 months ago

Hi @ai-learner-00, Thank you for your feedback.

Please see Custom classification model - Document Intelligence (formerly Form Recognizer) - Azure AI services | Microsoft Learn for differences between compose and classification.
Your longer term solution would be to rely on an explicit classification.
The doc type confidence score is an indication of how similar the document is to the training dataset.

Thank you again!

laujan commented 7 months ago

reassign: Naveenommi-MSFT

Naveenommi-MSFT commented 7 months ago

@ai-learner-00 We are going to close this thread, if there are any further questions regarding the documentation, please tag me in your reply and we will be happy to continue the conversation.

@laujan Thank you for your response.

kontax85 commented 3 weeks ago

Hello @Naveenommi-MSFT , in your message of 23 February there is this statement: Composed models allow you to combine prebuilt models and custom models to create a more accurate and robust model that can handle a wider range of form types and layouts.

In the documentation I have not found any mention to the capability of composing prebuilt and custom models, is this feature not documented or is it actually not supported?

Thank you.

MicrosoftDocs / azure-docs

Composed models vs classification models #120017

Document Details

reassign: Naveenommi-MSFT