datafold / data-diff

Compare tables within or across databases
https://docs.datafold.com
MIT License
2.95k stars 272 forks source link

readme should clarify that DB_URI arguments need to be surrounded in quotes when connecting to Snowflake #116

Closed leoebfolsom closed 2 years ago

leoebfolsom commented 2 years ago

I figured I'd share a 🤦 experience I just had during my setup with Snowflake. I was having trouble getting data-diff to accept my DB1_URI and DB2_URI arguments. I eventually realized they had to be in quotes. One would not have known that from the README which offers a postgres example that doesn't put the arguments in quotes (I assume quotes are not needed when using postgres?).

Here is what I encountered (I intentionally omitted arguments for brevity and RCA while debugging).

Without quotes:

% data-diff snowflake://me:mypw@myaccount/mydb/myschema?warehouse=mywh&role=myrole 
[1] 12578
zsh: no matches found: snowflake://me:mypw@myaccount/mydb/myschema?warehouse=mywh
% 
[1]  + exit 1     data-diff snowflake://me:mypw@myaccount/mydb/myschema?warehouse=mywh

(huh? did I enter my snowflake info wrong? scratches head)

With quotes:

% data-diff "snowflake://me:mypw@myaccount/mydb/myschema?warehouse=mywh&role=myrole"
Usage: data-diff [OPTIONS] DB1_URI TABLE1_NAME DB2_URI
                 TABLE2_NAME
Try 'data-diff --help' for help.

Error: Missing argument 'TABLE1_NAME'.

(nice! the error message helpfully tells me that I should enter the other arguments) :cool:

I think it would help a lot if right under the postgres example, there was a similar example for Snowflake, and imo for each tool that is up and running, especially since the syntax evidently varies.

I don't think every example needs to be as verbose as the postgres example, which includes output, but capturing any variation in syntax seems crucial. I don't think it's reasonable to expect users to know that the syntax for arguments differs.

The Snowflake example would look like this to make it equivalent to the postres example:

$ data-diff \
    "snowflake://<user>:<password>@<myaccount>/<database>/<schema>?warehouse=<warehouse>&role=<role>" RATING \
    "snowflake://<user>:<password>@<myaccount>/<database>/<schema>?warehouse=<warehouse>&role=<role>" RATING_DEL1 \
    --bisection-threshold 100000 \ # for readability, try default first
    --bisection-factor 6 \ # for readability, try default first
    --update-column timestamp \
    --verbose

A couple of other notes:

sirupsen commented 2 years ago

@leoebfolsom the idea was that people could grab the URI strings from the database table, but maybe we can make that clearer? There, the Snowflake tables are also capitalized. We could quote them in that table though!

As for carrots, I'm 👍🏻 on that

leoebfolsom commented 2 years ago

Hi @sirupsen ! It was clear that the URIs should be grabbed from the database table. I think if the Snowflake URI were quoted in that table (in contrast to the postgres URI), that would help a lot.

You're right that SCHEMA is capitalized in the db table, but TABLE is not, since it's not part of the Connection String. Perhaps the connection string should include the table name, but this is where it might be more straightforward to just provide an example similar to the postgres example provided (like the one I mocked up in this issue).

I don't feel strongly about whether the clarification that TABLE needs to be capitalized for Snowflake occurs in the database table or in an example, but I think dev time would be saved in the future if that were specified somewhere.

The quotes are a bigger issue IMO, and I agree that adding them to the database table makes sense. I'll open a PR and I (or someone) can refine it based on further feedback.

sirupsen commented 2 years ago

Oh nice, yep, capitalize table + quote + 🥕 and I think we're doing a lot better! 😍 Thanks for your thoughtfulness!

leoebfolsom commented 2 years ago

@sirupsen should the ports be carrotted, or are those fixed?

Screen Shot 2022-06-28 at 8 45 14 AM
sirupsen commented 2 years ago

No carrots for ports

The default ports should be in there and explicit when they make sense, so they're easy to change, but no, most of the time you won't change it

Simon https://sirupsen.com/

On Tue, Jun 28, 2022 at 11:46 AM, Leo Folsom @.***> wrote:

@sirupsen https://github.com/sirupsen should the ports be carrotted, or are those fixed? [image: Screen Shot 2022-06-28 at 8 45 14 AM] https://user-images.githubusercontent.com/1799931/176223114-c8741173-7714-4804-90f8-04cfd897370a.png

— Reply to this email directly, view it on GitHub https://github.com/datafold/data-diff/issues/116#issuecomment-1168892877, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAXY6H6AVSHJUGMOKXAPDDVRMM4HANCNFSM52ADYFGA . You are receiving this because you were mentioned.Message ID: @.***>

erezsh commented 2 years ago

Added a note in https://github.com/datafold/data-diff/#supported-databases