Closed davinchia closed 1 month ago
I'm evaluating Airbyte and our source DB has too many tables, it caused "io.temporal.failure.ServerFailure: Complete result exceeds size limit.".
IMO this is the key feature for us to continue using Airbyte.
@Deninc which database are you using?
@cgardens I'm using Oracle.
OK so I was able to pass the issue by setting BlobSizeLimitError. However it's still very slow and laggy to scroll pass a thousand tables. I know I only need 10 tables out of 1000 so this feature will defenitely help the user experience.
Hi there, I think I have the same kind of issue using the latest (0.1.7) MongoDB source, there are a lot of connections and the timeout of 1h is reached with no way to discover the schema or use any kind of fallback. Is there any countermeasure in the meantime ?
@tuliren for visibility. i'm not sure what priority this should be against other db issues but just wanted to make sure you saw it.
A workaround could be to suggest our user to create a mongo user dedicated to Airbyte, and only discover collections on which the mongo user has reading privilege.
@alafanechere I absolutely agree. It would save us a lot of time in the future and make having connection processes much smoother.
@cgardens I'm using Oracle.
OK so I was able to pass the issue by setting BlobSizeLimitError. However it's still very slow and laggy to scroll pass a thousand tables. I know I only need 10 tables out of 1000 so this feature will defenitely help the user experience.
@Deninc, by the way, Oracle connector does support schema specification since version 0.3.3
.
Was there a Snowflake work around for this problem? Is there a way to increase the BlobSizeLimitError via a argument or in the config file?
done
Tell us about the problem you're trying to solve
We currently try to discover all schemas in a database when discovering a source's schema. This can lead to issues if a source has too many schema tables. i.e. the catalog becomes too big and cannot be saved in our database.
This led to this https://github.com/airbytehq/airbyte/issues/2619.
Definitely a nice-to-have rather than a must have.
Describe the solution you’d like
Expose some sort of schema regex so users can specify what they want included in the discover job.
Describe the alternative you’ve considered or used
Allow users to specify tables to sync in addition to schemas.
TODOs