Snowflake-Labs / sfguide-getting-started-with-document-ai

Apache License 2.0
2 stars 8 forks source link

extraction.sql -- document processing / !predict step hits/exceeds current product limitations #5

Open sfc-gh-cgoyette opened 1 month ago

sfc-gh-cgoyette commented 1 month ago

Received feedback from an AIML Specialist SE:

Given we have a limitation on how many documents can be processed in a single query when using the PREDICT! Function, can we update the quickstart to include a limit? OR some sort of batching recommendation?

https://github.com/Snowflake-Labs/sfguide-getting-started-with-document-ai/blame/main/extraction.sql

Recommendation/request is to add LIMIT 1000 to the end of the following query.

`CREATE OR REPLACE TABLE doc_ai_qs_db.doc_ai_schema.CO_BRANDING_AGREEMENTS AS WITH -- First part gets the result from applying the model on the pdf documents as a JSON with additional metadata temp as( SELECT Relative_path as file_name , size as file_size , last_modified , file_url as snowflake_file_url -- VERIFY THAT BELOW IS USING THE SAME NAME AND NUMER AS THE MODEL INSTRUCTIONS YOU COPIED IN THE PREVIOUS STEP! , DOC_AI_QS_DB.DOC_AI_SCHEMA.DOC_AI_QS_CO_BRANDING!PREDICT(get_presigned_url('@doc_ai_stage', RELATIVE_PATH ), 1) as json from directory(@doc_ai_stage)'

sfc-gh-cgoyette commented 1 month ago

Note: I was unable to reproduce an actual issue in this quickstart, re single query !predict limit quoted above.

Qs