Baoqi / metabase-dremio-driver

Dremio driver for Metabase BI
Eclipse Public License 2.0
50 stars 10 forks source link

connection parameters : "database" is a required filed, but I don't know what to type in #1

Closed fabrice-etanchaud closed 3 years ago

fabrice-etanchaud commented 3 years ago

Hello @Baoqi , thank you for providing us with a new dremio driver ! Please, could you tell me what to put in the "database" required connection field ? I tried with several space names, with no luck.

Looking forward, Fabrice

Baoqi commented 3 years ago

Hi, @fabrice-etanchaud , You can run the following sql in dremio to get the schema lists:

select distinct "TABLE_SCHEMA" from "INFORMATION_SCHEMA"."TABLES"

I think any one can work, you can try using the built-in one: "sys" or "INFORMATION_SCHEMA"

And, I think this required connection field is not needed at all, since Metabase will still show out all schemas no matter what you entered here.

I think I can remove this "database" field tomorrow

Hello @Baoqi , thank you for providing us with a new dremio driver ! Please, could you tell me what to put in the "database" required connection field ? I tried with several space names, with no luck.

Looking forward, Fabrice

Baoqi commented 3 years ago

I just tried to input a wrong "database" field ( just some random text), but my Metabase environment can work correctly (even with the wrong value).

So, this issue may due to other problems , maybe you can put your dremio's version & metabase's version here, and also the full error messages, so that I can try to guess where maybe wrong

dantodor commented 3 years ago

What I noticed is that the database field is used only at connection time. Afterwards, you can select from any tables across different databases. Notice that in Dremio terminology, database is Data lakes. So just define a dummy one, use it to connect Metabase to Dremio, and then you will have access to mostly everything. Assuming a single user setup.

fabrice-etanchaud commented 3 years ago

Hi all ! This is great ! In my use case, there are a dozen relational datasources with dozen of schemas and hundreds of tables/views. So the catalog is breaking, I cannot update column types and dependencies... Given that dremio is a federation tool, possibly connected to a huge number of sources, do you think it could be possible to restrict metabase access to objects under the connection "database" ?

Baoqi commented 3 years ago

Hi, all,

Based on previous discussion, I make a new release today. https://github.com/Baoqi/metabase-dremio-driver/releases/tag/1.0.1

The changes are:

  1. In GUI, the meaning of "dbname" parameter is not clear. Actually it should be "schema" in Dremio. So, change it accordingly.
  2. In previous version, all readable tables from all schemas will be list, which is both time consuming, and may not be expected. So, in this release, for "schema" field, if user specify one schema, we will only list the tables for that exact schema. If user leave "schema" empty, then we will continue to list all tables from all schemas (old behavior)

If we specify one schema for "schema", then Dremio will only try to discover that specified schema, it will be much faster in my test.

Hope this can work in your environment, and the change from "dbname" to "schema" won't break existing Dremio connections.

Thanks.

fabrice-etanchaud commented 3 years ago

Thank you @Baoqi , yes this is much faster !

One thing : it seems the filter is an exact match, doesn't it ? Don't you think it would be valuable to have a LIKE 'schema%' instead, in order to catalog all the vds/pds in subfolders ?

Thank you again !

Baoqi commented 3 years ago

Thank you @Baoqi , yes this is much faster !

One thing : it seems the filter is an exact match, doesn't it ? Don't you think it would be valuable to have a LIKE 'schema%' instead, in order to catalog all the vds/pds in subfolders ?

Thank you again !

Hi, @fabrice-etanchaud , You're Welcome!

Yes, that's a good suggestion. I have 2 small questions about this change:

  1. Do you think we still need to also keep the exact match option, or we can simply make the "schema" just as a prefix like what you mentioned. (In the implementation, I will first retrieve all schema names, and then use Java's .startWith(_inputschema) to filter out matched ones)
  2. Do you think we need to provide another advanced field, like "custom schema sql", and if user specify a SQL in this field, like
    select distinct "TABLE_SCHEMA" from "INFORMATION_SCHEMA"."TABLES" where "TABLE_SCHEMA"  in ('schema1', 'schema2') or "TABLE_SCHEMA" like 'schema3%'

    we can execute this dremio SQL, and get a list of all matched schema names. By this, we can select any schema as we like, but this may be too hard to use for users, so this may not be needed?

Thanks

fabrice-etanchaud commented 3 years ago

I really appreciate your responsiveness !

You are right, given that dremio is a federation tool, there must be a way to configure the list of schemas of interest, in ordre to calm metabase. For the moment, I would be more than happy to have recursively all datasets under a given dremio path ! IMHO Metabase would be plugged in the business marts layer, so one can suppose all related data to be found recursively under a specific dremio path (facts, dimensions and the like). And nothing prevent us from creating multiple isolated metabase connections on different paths. What do you think of it ?

@Baoqi , did you have time to have a look at the dbt-dremio project ? Would you find it valuable ?

Baoqi commented 3 years ago

I really appreciate your responsiveness !

You are right, given that dremio is a federation tool, there must be a way to configure the list of schemas of interest, in ordre to calm metabase. For the moment, I would be more than happy to have recursively all datasets under a given dremio path ! IMHO Metabase would be plugged in the business marts layer, so one can suppose all related data to be found recursively under a specific dremio path (facts, dimensions and the like). And nothing prevent us from creating multiple isolated metabase connections on different paths. What do you think of it ?

@Baoqi , did you have time to have a look at the dbt-dremio project ? Would you find it valuable ?

Hi, @fabrice-etanchaud , I just make a new small release 1.0.2. https://github.com/Baoqi/metabase-dremio-driver/releases/tag/1.0.2 , which add a support to make "schema" field works as prefix. I did some basic tests on my local dev env (which only have small number of tables), it worked fine.

Yes, I'm very interested in dbt related projects, (although I haven't got time to carefully research dbt project yet), I think dbt-dremio should be very helpful, I will keep an eye on your project. Thanks.

fabrice-etanchaud commented 3 years ago

Just to tell you that it works like a charm !

Now nothing can stop us using metabase query builder on a mixture of nosql, rdbms and json, csv or excel files !

Thank you, have a good week end !

Baoqi commented 3 years ago

Just to tell you that it works like a charm !

Now nothing can stop us using metabase query builder on a mixture of nosql, rdbms and json, csv or excel files !

Thank you, have a good week end !

Thanks. Glad it works.

Now, since it works, let's close this issue.