airbytehq / PyAirbyte-Hackathon

Tasks for PyAirbyte Hackathon June 2024
0 stars 2 forks source link

Illustrate storing vector data into Snowflake using PyAirbyte #28

Closed bindipankhudi closed 5 months ago

bindipankhudi commented 5 months ago

Summary

Show users how they can load Github data into snowflake using PyAirbyte, followed by prepping the data for vector search.

Description

This task involves the following

Resources

fvgm-spec commented 5 months ago

Hi @bindipankhudi could you please assign this one to me?

marcosmarxm commented 5 months ago

It is yours @fvgm-spec

fvgm-spec commented 5 months ago

Hi @bindipankhudi I am trying to connect a trial snowflake account using the Python snowflake.connector, I am following your sample tutorial and setting the connecting with the below configuration getting an error connection, I suspect that is related to the SNOWFLAKE_HOST

Image

Could you provide a guideline on connecting Snowflake using the Python connector?

bindipankhudi commented 5 months ago

Hi Felix, try using the account name as the host. For example, instead of xyz.snowflakecomputing.com, just use xyz. The name SNOWFLAKE_HOST is a bit confusing. The actually account name should be passed here.

If ur having trouble locating the account name, look at the browser URL when you log in to snowflake.

Bindi Pankhudi

Engineering Lead, AI/LLM

GitHub https://github.com/bindipankhudi | LinkedIn https://www.linkedin.com/in/pankhudisinha/

We're hiring, come work with me! https://airbyte.io/careers [image: 🚀]

On Sat, Jun 8, 2024 at 8:08 AM Felix Gutierrez @.***> wrote:

Hi @bindipankhudi https://github.com/bindipankhudi I am trying to connect a trial snowflake account using the Python snowflake.connector, I am following your sample tutorial https://github.com/airbytehq/quickstarts/blob/main/vector_store_integration/RAG_using_Snowflake_Cortex.ipynb and setting the connecting with the below configuration getting an error connection, I suspect that is related to the SNOWFLAKE_HOST

image.png (view on web) https://github.com/airbytehq/PyAirbyte-Hackathon/assets/60470663/53d9947c-bc32-4926-ba73-6f9ac67482bc

Could you provide a guideline on connecting Snowflake using the Python connector?

— Reply to this email directly, view it on GitHub https://github.com/airbytehq/PyAirbyte-Hackathon/issues/28#issuecomment-2156070191, or unsubscribe https://github.com/notifications/unsubscribe-auth/BF2EJSQYADW54BBQS55FTK3ZGMNALAVCNFSM6AAAAABIZWRXGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGA3TAMJZGE . You are receiving this because you were mentioned.Message ID: @.***>

fvgm-spec commented 5 months ago

Hi @bindipankhudi thanks so much for your quick response the URL of my snowflake instance is a trial one, hence it does not have the snowflakecomputing.com part, mine is https://app.snowflake.com/ibqtudo/nl03202 so I have tried to set separately ibqtudo and nl03202 as the HOST but none of them works. Maybe it could be because it is a trial account?

Image

bindipankhudi commented 5 months ago

Maybe try this to get the account name:

https://docs.snowflake.com/en/sql-reference/functions/current_account_name

Bindi Pankhudi

Engineering Lead, AI/LLM

GitHub https://github.com/bindipankhudi | LinkedIn https://www.linkedin.com/in/pankhudisinha/

We're hiring, come work with me! https://airbyte.io/careers [image: 🚀]

On Sat, Jun 8, 2024 at 8:58 AM Felix Gutierrez @.***> wrote:

Hi @bindipankhudi https://github.com/bindipankhudi thanks so much for your quick response the URL of my snowflake instance is a trial one, hence it does not have the snowflakecomputing.com part, mine is https://app.snowflake.com/ibqtudo/nl03202 so I have tried to set separately ibqtudo and nl03202 as the HOST but none of them works. Maybe it could be because it is a trial account?

image.png (view on web) https://github.com/airbytehq/PyAirbyte-Hackathon/assets/60470663/47d7e1e8-894e-41e0-8314-3d580ba4db50

— Reply to this email directly, view it on GitHub https://github.com/airbytehq/PyAirbyte-Hackathon/issues/28#issuecomment-2156085712, or unsubscribe https://github.com/notifications/unsubscribe-auth/BF2EJSVM5AR66KMXZA4OT3LZGMS2RAVCNFSM6AAAAABIZWRXGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJWGA4DKNZRGI . You are receiving this because you were mentioned.Message ID: @.***>

marcosmarxm commented 5 months ago

@fvgm-spec let me know if the troubleshooting session was helpful and got you unblocked.

fvgm-spec commented 5 months ago

Hey @marcosmarxm sure it was beneficial in handling the output from the to_documents method, so I think I will make it with a little more work. I will let you know!

fvgm-spec commented 5 months ago

Hey @marcosmarxm once I have spent long hours troubleshooting this, I am quitting it :(, The last thing I did was to store the issues.content into a list so I could load it into the snowflake table:

Image

Then loading the content strings stored in the list, no success:

Image

It seems to be related to special characters in the strings that contain the content:

ProgrammingError: 001003 (42000): SQL compilation error:
parse error line 8 at position 5 near '<EOF>'.
syntax error line 1 at position 64 unexpected '# '.
parse error line 3 at position 7 near '10'.

This is the snowflake table, and I am leaving available the notebook that contains all the code I was working on

Image

fvgm-spec commented 5 months ago

Hi @marcosmarxm could you please unassign me this one, as I am not able to continue working on it? Thanks. Could I be assigned to this one? https://github.com/orgs/airbytehq/projects/75/views/4?pane=issue&itemId=63202677

gupta-arpan commented 5 months ago

Hi @marcosmarxm @bindipankhudi can you assign me this issue i would like to work on this, Thanks.

marcosmarxm commented 5 months ago

@gupta-arpan all yours!

gupta-arpan commented 5 months ago

@bindipankhudi I have made a PR please review it.

bindipankhudi commented 5 months ago

Thank you @gupta-arpan! I will review this.

bindipankhudi commented 5 months ago

Reviewed and merged! Thank you @gupta-arpan for the good work! :)