Closed rbiseck3 closed 1 month ago
This adds in a source connector to list and download files from UC Volumes using the dbfs utilts in the SDK. Example:
from unstructured.ingest.v2.processes.connectors.databricks_volumes import ( DatabricksVolumesDownloader, DatabricksVolumesIndexer, DatabricksVolumesIndexerConfig, DatabricksVolumesDownloaderConfig, DatabricksVolumesAccessConfig, DatabricksVolumesConnectionConfig, ) import os from pathlib import Path connection_configs = DatabricksVolumesConnectionConfig( host=os.getenv("DATABRICKS_HOST"), access_config=DatabricksVolumesAccessConfig( token=os.getenv("DATABRICKS_TOKEN"), ), ) indexer = DatabricksVolumesIndexer( connection_config=connection_configs, index_config=DatabricksVolumesIndexerConfig( remote_url="/Volumes/unstructured_solutions/unstructured_test_schema/unstructured-volume" ), ) downloader = DatabricksVolumesDownloader( connection_config=connection_configs, download_config=DatabricksVolumesDownloaderConfig( download_dir=Path("/Users/romanisecke/Downloads/databricks-download") ), ) for f in indexer.run(): downloader.run(file_data=f)
Description
This adds in a source connector to list and download files from UC Volumes using the dbfs utilts in the SDK. Example: