airbytehq / PyAirbyte

PyAirbyte brings the power of Airbyte to every Python developer.
https://docs.airbyte.com/pyairbyte
Other
215 stars 31 forks source link

AirbyteConnectorMissingCatalogError while trying to read source-azure-blob-storage #280

Closed IronJayx closed 2 months ago

IronJayx commented 3 months ago

Hi,

I am getting a catalag error while trying to connect/ read from Azure storage.

Any idea on why ?

All help would be much appreciated, thanks !

Here is my code:

import os
import airbyte as ab
from dotenv import load_dotenv

load_dotenv()

# Configure and read from the source
read_result = ab.get_source(
    "source-azure-blob-storage",
    config={
        "authentification": ["airbytehq/pyAirbyte"],
        "credentials": {
            "auth_type": "storage_account_key",
            "azure_blob_storage_account_key": os.environ.get('AZURE_STORAGE_ACCOUNT_KEY')
        },
        "azure_blob_storage_account_name": os.environ.get('AZURE_STORAGE_ACCOUNT_NAME'),
        "azure_blob_storage_container_name": os.environ.get('AZURE_STORAGE_CONTAINER_NAME')
    },
).read()

print(read_result)

Here is the error:

Traceback (most recent call last):
  File "/root/airvector/examples/airbyte_test.py", line 20, in <module>
    ).read()
  File "/root/.cache/pypoetry/virtualenvs/airvector-JnIzwmfs-py3.10/lib/python3.10/site-packages/airbyte/sources/base.py", line 708, in read
    available_streams=self.get_available_streams(),
  File "/root/.cache/pypoetry/virtualenvs/airvector-JnIzwmfs-py3.10/lib/python3.10/site-packages/airbyte/sources/base.py", line 222, in get_available_streams
    return [s.name for s in self.discovered_catalog.streams]
  File "/root/.cache/pypoetry/virtualenvs/airvector-JnIzwmfs-py3.10/lib/python3.10/site-packages/airbyte/sources/base.py", line 320, in discovered_catalog
    self._discovered_catalog = self._discover()
  File "/root/.cache/pypoetry/virtualenvs/airvector-JnIzwmfs-py3.10/lib/python3.10/site-packages/airbyte/sources/base.py", line 186, in _discover
    raise exc.AirbyteConnectorMissingCatalogError(
airbyte.exceptions.AirbyteConnectorMissingCatalogError: AirbyteConnectorMissingCatalogError: Connector did not return a catalog.

Log output:
        Error starting the sync. This could be due to an invalid configuration or catalog. Please contact Support for assistance.
aaronsteers commented 3 months ago

@IronJayx - At first I thought this was due the connector being in Java (it's not). Looking more closely, I think what is happening is that the connector is not able to discover any streams. I believe you need to also specify a streams collection in the config.

image

https://docs.airbyte.com/integrations/sources/azure-blob-storage#reference

IronJayx commented 2 months ago

Thanks @aaronsteers that was correct ! The .check() works now.

Now I want to disable parser to process unstructured files (images/ videos) but I guess that is for another issue.

import os
import airbyte as ab
from dotenv import load_dotenv

load_dotenv()

# Configure and read from the source
source = ab.get_source(
    "source-azure-blob-storage",
    install_if_missing=True,
    config={
        # "authentification": ["airbytehq/pyAirbyte"],
        "credentials": {
            "auth_type": "storage_account_key",
            "azure_blob_storage_account_key": os.environ.get('AZURE_STORAGE_ACCOUNT_KEY')
        },
        "azure_blob_storage_account_name": os.environ.get('AZURE_STORAGE_ACCOUNT_NAME'),
        "azure_blob_storage_container_name": os.environ.get('AZURE_STORAGE_CONTAINER_NAME'),
        "streams": [{
            "name": "all",
            "format": {
                "filetype": "unstructured",
                "skip_unprocessable_files": False,

            },
            "globs": ["**"]
        }]
    },
)

source.check()