Open ninalopatina opened 4 months ago
Same thing happens to me when trying to parse a GDrive word document with some tables, images, TOC, header, footer, etc. about 30 pages long.
is anyone getting this issue in google drive v2 ingestion ?
2024-11-19 22:12:47,003 SpawnProcess-18 ERROR
C:\Users\SANTHOSH\.cache\unstructured\ingest\pipeline\index\34b4026053f1.json: [download] 'GoogleDriveDownloader' object has no attribute 'meta'
Thanks for reporting that @SantoshKumarRavi! It's a bug (see this line). We need to prepare a fix for that.
Describe the bug Google Docs/Sheets/Slides not working in the V2 SDK Google Drive source connector
To Reproduce
Ingesting from Google Drive, partitioning via Unstructured API, embedding via OpenAI,and writing to AstraDB
runner = GoogleDriveRunner( processor_config=ProcessorConfig( verbose=True, output_dir=os.environ['GOOGLE_DRIVE_OUTPUT'], num_processes=2, ), read_config=ReadConfig(), partition_config=PartitionConfig( partition_by_api=True, api_key=os.getenv("UNSTRUCTURED_API_KEY") ), connector_config=SimpleGoogleDriveConfig( access_config=GoogleDriveAccessConfig( service_account_key=os.getenv("GOOGLE_DRIVE_ACCOUNT_KEY") ), recursive=True, drive_id=os.getenv("GOOGLE_DRIVE_FOLDER_ID"), ), chunking_config=ChunkingConfig(chunk_elements=True), embedding_config=EmbeddingConfig( provider="langchain-openai", api_key=os.getenv("OPENAI_API_KEY"), ), writer=get_writer(), writer_kwargs={}, )
Expected behavior As in V1, I expect the file to be parsed
Screenshots
KeyError Traceback (most recent call last)