evidence-dev / evidence

Business intelligence as code: build fast, interactive data visualizations in pure SQL and markdown
https://evidence.dev
MIT License
3.44k stars 167 forks source link

Add a data source for Trino/Presto #1201

Closed maxstreese closed 7 months ago

maxstreese commented 8 months ago

Description

This PR adds a data source for Trino/Presto.

Resolves #1095.

Adding Trino/Presto significantly increases the number of data sources supported by Evidence, as they function as data source aggregators.

Checklist

Screenshots

The new connection form with a valid connection

image

The new connection form with an invalid connection

image

A query running against the new data source

image

changeset-bot[bot] commented 8 months ago

🦋 Changeset detected

Latest commit: c931cdc217f302700ae33e1256e2a734e0d461ab

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages | Name | Type | | ----------------------------- | ----- | | @evidence-dev/core-components | Patch | | @evidence-dev/db-orchestrator | Patch | | @evidence-dev/evidence | Patch | | @evidence-dev/postgres | Patch | | @evidence-dev/trino | Patch | | evidence-docs | Patch | | @evidence-dev/components | Patch | | evidence-test-environment | Patch | | @evidence-dev/redshift | Patch |

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel[bot] commented 8 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
evidence-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 25, 2023 0:25am
netlify[bot] commented 8 months ago

Deploy Preview for evidence-development-workspace ready!

Name Link
Latest commit c931cdc217f302700ae33e1256e2a734e0d461ab
Latest deploy log https://app.netlify.com/sites/evidence-development-workspace/deploys/65117bdfc4029400080fb860
Deploy Preview https://deploy-preview-1201--evidence-development-workspace.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

maxstreese commented 8 months ago

Hi all!

Let me know if there is anything else I can do here at this point. For now I would consider this as final from my side

Best, Max

maxstreese commented 8 months ago

Just thinking out loud, it may also be and idea to ask the Trino folks to be included in their list of supported clients once the datasource is supported by Evidence. Might attract some attention.

archiewood commented 8 months ago

Thanks for the contribution! 🙏🏼

I don't have any experience with presto trino, so I set up a free account on Starburst to give this a test with.

I ran into issues connecting:

CleanShot 2023-09-21 at 22 49 42@2x
  1. The error message when connecting does not seem to get unpacked. Do we need to unpack this somehow?
  2. Checking in the console, the error I'm specifically getting is around sending an HTTP request to an HTTPS port:
    Object { message: "execution error:<html>\r\n<head><title>400 The plain HTTP request was sent to HTTPS port</title></head>\r\n<body>\r\n<center><h1>400 Bad Request</h1></center>\r\n<center>The plain HTTP request was sent to HTTPS port</center>\r\n<hr><center>cloudflare</center>\r\n</body>\r\n</html>\r\n", code: 400 }

Are there other config options I should be passing in here? I feel like starburst should be requiring some kind of password, but adding my starburst login pw didn't change the above behaviour.

Apologies if I'm being slow here!

maxstreese commented 8 months ago

Hey, awesome! You need to set SSL to "true" in this case. Then things should work (you also need to provide user and password). I tried this against my company's Trino installation and that worked.

My understanding is that HTTPS uses SSL/TLS so therefore requiring the user to set the SSL option for an HTTPS connection seems reasonable. It's also in line with how the client library underneath works. Additionally this allows for the freedom to run Trino in HTTPS mode but on a different port than 443. That being said, if you'd like me to silently set the SSL option for the user in case the port is 443, I could also do that.

Let me know 😀

maxstreese commented 8 months ago

Hi @archiewood,

I have made three changes now:

  1. Unpack the error message as you suggested
  2. Provide user instructions in the form for the SSL parameter (stating that it must be set for HTTPS connections)
  3. Include an engine option which should allow this data source to be used for both Trino and Presto. The underlying client library allows for this. I have not tested against a Presto installation, but only against both a local (HTTP) as well as a remote (HTTPS) Trino installation

I have updated the PR title and description accordingly.

archiewood commented 8 months ago

Include an engine option which should allow this data source to be used for both Trino and Presto.

Sorry I misspoke in my above comment. Starburst is cloud hosted trino. I wasn't aware they were different aside from the naming tbh

So I've made progress, I now have a new error message after setting ssl=true

CleanShot 2023-09-22 at 10 15 21@2x

Again, this feels like a fair point, there seems to be no password or auth. So I threw in my starburst password. Still no dice:

CleanShot 2023-09-22 at 10 20 06@2x

I'm not really sure. Maybe I should be using a service account for this? EDIT: No luck with a service account either

maxstreese commented 8 months ago

Sorry I misspoke in my above comment. Starburst is cloud hosted trino. I wasn't aware they were different aside from the naming tbh

No problem and you didn't. I was aware that Starburst is Trino. But I realized that it was silly to exclude Presto. So I added the engine option. This was not related to your comment at all.

So I threw in my starburst password. Still no dice

Ah my bad, that's embarrassing. I had a bug in the auth function. Unfortunately I do not have a local test setup for this but I hope with the changes I just pushed things should work. Please let me know!

archiewood commented 8 months ago

Hmm, still not getting there my side. Now I'm getting a rather unhelpful, unspecified "execution error"

Unfortunately I do not have a local test setup for this but I hope with the changes I just pushed things should work. Please let me know!

FWIW you can set up a free starburst account in about 10 mins, which might help with testing

archiewood commented 8 months ago

my main sticking point is that I feel like trino should be authing me somehow. I can see if you have a local setup then maybe that's not an issue, but presumably any prod connection will need to be authed?

So why does starburst not give me any auth params - all these things are very guessable:

CleanShot 2023-09-22 at 12 08 21@2x

I'm guessing something to do with HTTPS/SSL/TLS solves this?

I feel like I'm missing something?

Maybe some clues

CleanShot 2023-09-22 at 12 14 18@2x
maxstreese commented 8 months ago

Let me check later this weekend. Right now when user and password are both provided, I set the basic_auth property of the client library I use (presto-client-node). I believe this should be supported by Starburst somehow. I'll sign myself up for a free account as you suggested.

maxstreese commented 8 months ago

Hi @archiewood !

I actually think the implementation is working as is right now. I signed up with Starburst and this is what I got:

image

The password I provided is the regular password of my account.

If I then go ahead and create a sample site in the example project I get:

image

Finally, if I check the query history on Starburst I can see both, the query that Evidence uses to test the connection initially, as well as my sample query:

image

archiewood commented 7 months ago

Nice! It's also working for me!

I hadn't realised but the free trino clusters auto suspend after 15 mins. Weird that the error didn't tell me this. Feels like it should.

CleanShot 2023-09-22 at 15 45 38@2x
archiewood commented 7 months ago

Okay so having been through this process, I think it would be great to add a bit of docs help / explanation to go alongside this.

  1. IIUC your connector only supports password authentication? (which is fine if so) - we should say this in the docs section for trino, as Trino has many auth methods: https://trino.io/docs/current/security/authentication-types.html
  2. The other gotchas I had were using starburst specifically. I have no idea how popular starburst is vs other ways of running trino, but if its a meaningful percentage, we should maybe include a note on this too:
    • knowing that the password was required AND that the password was the one i use to login to the UI client, rather than some other key
    • knowing that I should set SSL=true
  3. more generally - should the password be required? Or are there circumstances where it is not (eg local dev)
maxstreese commented 7 months ago

1. RE: Authentication Methods

Correct, my implementation only supports password (i.e. basic) authentication. Two comments about this below.

The limitation here is in the available Trino client library for Node/JS, namely presto-client-node. As this does currently only support password authentication, so does my connector. One could theoretically either implement a fully custom Trino client for Evidence, or extend presto-client-node to support more authentication methods. I am currently not willing/able to do this.

Looking at the data consumer list provided by Starburst, you will find that quite a few of the available Trino clients actually only support the password based authentication types. So I think that Evidence would be in good company here.

2. RE: Should The Password Be Required?

No. There are setups besides local development where you do not need a password. You can for instance self host Trino and give access via VPN.

3. RE: Starburst Specific Documentation

I have no clue how many companies use Starburst vs how many self host but it certainly can't hurt to include information about Starburst in the docs.

The password is not necessarily the one you use to log in to the Starburst UI by the way. You could also create a dedicated service account and use that one for connecting:

image

4. Providing Documentation

Let me try to come up with something

maxstreese commented 7 months ago

Done. Let me know if I should add/change something.

archiewood commented 7 months ago

Looks great - thanks for the contribution!

mcrascal commented 7 months ago

Amazing! Thanks @maxstreese @archiewood 🎉

maxstreese commented 7 months ago

Awesome, thanks! In case anything turns out to be amiss just let me know.