getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
http://redash.io/
BSD 2-Clause "Simplified" License
25.9k stars 4.33k forks source link

Athena Data Source: support Catalog #5682

Open VasilyFomin opened 2 years ago

VasilyFomin commented 2 years ago

PyAthena added support for Catalogs in https://github.com/laughingman7743/PyAthena/issues/220 and it'd be great if redash also supported them.

What I think needs to be done:

susodapop commented 2 years ago

I agree this looks like a good addition. Since we don't use Athena on the core team we need someone in the community to write and verify the changes. But we've updated our documentation about writing a query runner which should make this straightforward. In summary:

  1. Add catalog_name to the configuration schema. This will automatically add it to the data source connection screen and persist it to the database.
  2. Update the pyathena.connect() call to use the value of self.configuration["catalog_name"].
  3. Establish a default behaviour so that existing Athena data sources won't break since a catalog_name is not specified.

If you'd like to give it a try I'm happy to help with any questions. And look forward to reviewing and merging this soon.

yongchand commented 2 years ago

Is anyone working on this issue? Happy to take a shot

susodapop commented 2 years ago

Awesome @yongchand thank you! Nobody else has picked this up. Feel free to ping me with any questions along the way.

yongchand commented 2 years ago

@susodapop Just out of curiosity, https://github.com/getredash/redash/pull/5741#issue-1213704982 isnt this PR related?

susodapop commented 2 years ago

🤦 You're correct.

If that PR merges will it meet your need? Can you check out the change and try it (cautiously)?

yongchand commented 2 years ago

@susodapop Sure. I can run on docker and see if it works. But where should we insert this extra_options?

susodapop commented 2 years ago

You insert the options on the data source setup screen in settings. That pull request modifies the query runner configuration_schema, which controls what options are displayed there.

yongchand commented 2 years ago

@susodapop sorry again, is there any instructions for testing codes using docker? (Not using pre-existing image)

susodapop commented 2 years ago

Yes, you can run our Docker development devloop: https://redash.io/help/open-source/dev-guide/docker

The only difference is here you will need to checkout that pull request's code, instead of master.

This is easiest if you install the GitHub CLI (gh) on your machine. Then right after the setup step where you clone redash you will run:

gh pr checkout 5741

Which will pull that feature branch onto your machine. Then when you run docker-compose up -d it will spin up the containers using your local code, and any changes you make will be visible when you browse to localhost:5000.

yongchand commented 2 years ago

@susodapop sorry for keep bugging you. Seems like I have done a successful job to run redash on local environment via docker. However, I cant see athena in datasource, even though i can see numerous other data sources like prometheus or etc. Is this behavior expected?

susodapop commented 2 years ago

No worries, this is what I'm here for :) I'm happy to jump on a call with you if that will help. Just ping me jesse@redash.dev and I'll send you a meeting invitation.

The athena data source is enabled by default. It will not appear in the data source list in the following cases:

susodapop commented 2 years ago

Also FWIW, I can load these changes when I check out that PR branch, so it's likely something failed during your docker-compose up -d command.

CleanShot 2022-07-20 at 12 34 23@2x
yongchand commented 2 years ago

@susodapop Does running on M1 Mac may affect the issue? I will take a one last shot and if it fails we can arrange a meeting

susodapop commented 2 years ago

oh yes that will certainly affect things (but not for long). The build step on M1 macs currently fails on master, which would cause exactly the outcome you see.

I've made a pull request that fixes it: https://github.com/getredash/redash/pull/5788

Since it hasn't merged to master yet, you can do the following to apply the fix at the same time as the Athena upgrade pull request.

# starting from an empty folder
git clone https://github.com/getredash/redash.git
cd redash/
gh pr checkout 5741
curl https://patch-diff.githubusercontent.com/raw/getredash/redash/pull/5788.patch > 0001-m1-fix.patch
git am 0001-m1-fix.patch

This will apply the M1 build fix as an extra commit. Then you just run docker-compose build to rebuild your containers. When that finishes docker-compose up -d will work :)

yongchand commented 2 years ago

@susodapop I just tested and can confirm that it works! I was successful to connect two different catalog in my Athena. Can you merge PR and distribute new docker image if possible?

gss2002 commented 4 months ago

Has this been fixed?

dtaniwaki commented 1 month ago

https://github.com/getredash/redash/pull/7059

Please check out my feature proposal which provides a feature to display the merged schema list of multiple catalogs.