Given we have a limitation on how many documents can be processed in a single query when using the PREDICT! Function, can we update the quickstart to include a limit? OR some sort of batching recommendation?
Recommendation/request is to add LIMIT 1000 to the end of the following query.
`CREATE OR REPLACE TABLE doc_ai_qs_db.doc_ai_schema.CO_BRANDING_AGREEMENTS
AS
WITH
-- First part gets the result from applying the model on the pdf documents as a JSON with additional metadata
temp as(
SELECT
Relative_path as file_name
, size as file_size
, last_modified
, file_url as snowflake_file_url
-- VERIFY THAT BELOW IS USING THE SAME NAME AND NUMER AS THE MODEL INSTRUCTIONS YOU COPIED IN THE PREVIOUS STEP!
, DOC_AI_QS_DB.DOC_AI_SCHEMA.DOC_AI_QS_CO_BRANDING!PREDICT(get_presigned_url('@doc_ai_stage', RELATIVE_PATH ), 1) as json
from directory(@doc_ai_stage)'
Note: I was unable to reproduce an actual issue in this quickstart, re single query !predict limit quoted above.
Qs
What is limit of product today?
Will this change in the near-term, or is it expected to be consistent for the foreseeable future?
Do we put a limit in the quickstart query, even if the actual amount of results is, in the demo, just the 10 documents we uploaded for inference? Potentially misleading/confusing
Received feedback from an AIML Specialist SE:
https://github.com/Snowflake-Labs/sfguide-getting-started-with-document-ai/blame/main/extraction.sql
Recommendation/request is to add LIMIT 1000 to the end of the following query.
`CREATE OR REPLACE TABLE doc_ai_qs_db.doc_ai_schema.CO_BRANDING_AGREEMENTS AS WITH -- First part gets the result from applying the model on the pdf documents as a JSON with additional metadata temp as( SELECT Relative_path as file_name , size as file_size , last_modified , file_url as snowflake_file_url -- VERIFY THAT BELOW IS USING THE SAME NAME AND NUMER AS THE MODEL INSTRUCTIONS YOU COPIED IN THE PREVIOUS STEP! , DOC_AI_QS_DB.DOC_AI_SCHEMA.DOC_AI_QS_CO_BRANDING!PREDICT(get_presigned_url('@doc_ai_stage', RELATIVE_PATH ), 1) as json from directory(@doc_ai_stage)'